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(57) Abstract 

A novel polypeptide with binding affinity for the pl85H^>^ receptor, designated heregulin-a. has been identified and pu- 
rified from cultured human cells. DNA sequences encoding additional heregulin polypeptides, designated beregulin-o^ heregu- 
lin-Pl, heregulin-P2, heregulin-P2-like. and heregulin-p3. have been isolated, sequenced and expressed. Provided herein are nuc- 
leic acid sequences encoding the amino acid sequences of heregulins useful in the production of heregulins by recombinant 
nteans. Further provided are the amino acid sequences of heregulins and purification methods therefor. Heregulins and their an- 
Mlodies are useful as therapeutic agents and in diagnostic methods. 
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HERE6ULINS (HRGs). BINDING PROTEINS OF P185^'"^^ 
BACKGROUND OF THE INVENTION 

5 

This invention relates to potypeptide ligands that bind to receptors implicated in 
cellular growth. In particular, it relates to polypeptide ligands that bind to the pIBS^^f^^ 
receptor. 

10 

pgsgrjptiph 3y KyoMfKi and Retg^gd Art 

Cellular protooncogenes encode proteins that are thought to regulate normal cellular 
proliferation and differentiation. Alterations in their structure or amplification of their 
expression lead to abnormal cellular growth and have been associated with carcinogenesis 

15 (Bishop JM. Sc/ence 235:305-311 (1987]);{Rhims JS. Cancer Detection and Prevention ^VA39' 
149 (19881); (Nowell PC, Cancer Res. 46:2203-2207 [1986]); (Nicolson GL, Cancer Res. 
47:1473-1487 [1987]). Protooncogenes were first identified by either of two approaches. 
First, molecular characterization of the genomes of transfomiing retrovimses showed that 
the genes responsible for the transforming ability of the virus in many cases were altered 

20 versions of genes found in the genomes of normal cells. The normal version is the 
protooncogene, which is altered by mutation to give rise to the oncogene. An example of such 
a gene pair is represented by the EGF receptor and the v-eit-B gene product. The virally 
encoded v-erb-B gene product has suffered truncation and other alterations that render it 
constitutfveiy active and endow it with the ability to induce cellular transformation (Yarden et 

25 al., Ann. Rev. Biochem. 57:443^78. 1 988). 

The second method for detecting cellular transforming genes that behave in a 
dominant fashion involves transfection of cellular DNA from tunrK)r cells of various species 
into nontransfomr>ed target cells of a heterologous species. Most often this was done by 
transfection of human, avian, or rat DNAs into the murine NIH 3T3 cell line (Bishop JM, 

30 Sc/ence 235:305-311 (1987]); (Rhims JS. Cancer Detection and Prevention 11:139-149 [1988]); 
(Nowell PC, Cancer. Res. 46:2203-2207 [1986]); (Nicolson GL. Cancer. Res. 47:1473-1487 
[1987]); (Yarden e/a/., Ann. Rev. Biochem. 57:443-478 [1988]). Following several cycles of 
genomic DNA isolation and retransfection, the human or other species DNA was molecutarly 
cloned from the murine background and subsequently characterized. In some cases, the same 

35 genes were isolated following transfection and cloning as those identified by the direct 
characterization of transforming viruses. In other cases, novel oncogenes were identified. An 
example of a novel oncogene identified by this transfection assay is the neu oncogene. It was 
discovered by Weinberg and colleagues in a transfection experiment in which the initial DNA 
was derived from a carcinogen-induced rat neuroblastoma (Padhy eta!., Ce// 28:865-871 
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[1982]); (Schechter et aL, Nature 312:513*516 [1984]). Characterization of the rat neu 
oncogene revealed that it had the structure of a growth factor receptor tyrosine kinase, had 
homology to the EGF receptor, and differed from Its normal counterpart, the neu 
protooncogene. by an activating mutation in its transmembrane domain (Bargmann et al. Cell 
5 45:649-657 [1 986]). The human counterpart to neu is the HER2 protooncogene, also designated 
c-erb- B2 (Coussens etal, Science 230:1 137-1 139 [1985]). WO89/06692). 

The association of the HER2 protooncogene with cancer was established by yet a 
third approach, that is, its association with human breast cancer. The HER2 protooncogene 
was first discovered in cDNA libraries by virtue of its homology with the EGF receptor, with 

10 which it shares structural Sffnilarities throughout (Yarden et a/., Ann. Rev. Bbchem. 57:443- 
478 [1988]). When radioactive probes derived from the cDNA sequence encoding p185H^^^ 
were used to screen DNA samples from breast cancer patients, amplification of the HER2 
protooncogene was observed in about 30% of the patient samples (Stamon et aL, Science 
235:177-182 [1987]). Further studies have confirmed this original observation and extended rt 

15 to suggest an important corelation between HER2 protooncogene amplification and/or 
overexpression and worsened prognosis in ovarian cancer and non-small cell lung cancer 
(Slamon efa/., Sc/ence 244:707-712 [1989]); (Wright etai, Cancer fles 49:2087-2090. 1989); 
(Paik et ai, */. C/a Oncology 8:103-1 12 [1990]); (Berchuck et al, Cancer Res. 50:4087-4091 , 
1990); (Kern efa/.. Cancer Res, 50:5184-5191. 1990). 

20 The association of HER2 amplification/overexpression with aggressive malignancy, 

as described above, implies that it may have an important role in progressk)n of human 
cancer; however, many tumor-related cell surface antigens have been described in the past, 
few of which appear to have a direct rote in the genesis or progression of disease (Schlom et 
al Cancer Res. 50:820-827, 1990); (Szala e/a/., Proc. NatL Acad, ScL 98:3542-3546). 

25 Among the protooncogenes are those that encode cellular growth factors which act 

through endoplasmic kinase phosphorylation of cytoplasmic protein. The HER1 gene (or ert)- 
81) encodes the epklermal growth factor (EGF) receptor. The p-chain of platelet-derived 
growth factor is encoded by the c-sis gene. The granuk>cyte-macrophage cotony stimulating 
factor is encoded by the c-fms gene. The neu protooncogene has been kjentified in 

30 ethylnitrosourea-induced rat neuroblastomas. The HER2 gene encodes the 1,255 amino acid 
tyrosine kinase receptor-like glycoprotein pi 85^^^ that has homotogy to the human epidermal 
growth factor receptor. 

The known receptor tyrosine kinases all have the same general stmctural motif: an 
extracellular domain that binds tigand, and an intracellular tyrosine kinase domain that is 

35 necessary for signal transductk)n and transfonnation. These two domains are connected by 
a single stretch of approximately 20 mostly hydrophobic amino acids, called the 
transmembrane spanning sequence. This transmembrane spanning sequence is thought to 
play a role in transfening the signal generated by iigand binding from the outside of the cell to 
the inside. Consistent with this general structure, the human p185^'^ glycoprotein, which is 
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located on the cell surface, may be divided into three princpal portions: an extracellular 
domain, or ECD (also known as XCD); a transmembrane spanning sequence; and a 
cytoplasmic, intracellular tyrosine kinase domain. While it is presumed that the extracellular 
domain is a ligand receptor, the pIBS^^^^ ligand has not yet been positively kientified. 
5 No specific ligand binding to pISS^^^^ has been kJentified, although Lupu et aL, 

{Science 249:1552-1555. 1989) describe an inhft)itory 30 kDa glycoprotein secreted from human 
breast cancer cells whk:h is alleged to be a putative ligand for pISS^^^. Lupu et ai, Science, 
249:1552-1555 (1990); Proceedings of the American Assoc. for Cancer Researcf), Vol 32, Abs 
297, March 1991) reported the purificatk)n of a 30 kD factor from MDA-MB-231 cells and a 75 

10 kD factor from SK-BR-3 cells that stimulates p185MER2. The 75 kD factor reportedly induced 
phosphoryiatk)n of p^B5^^^ and modulated cell proliferation and cotony fonnatton of SK-BR-3 
cells overexpressing the p185MER2 receptor. The 30 kD factor competes with muMab 4D5 for 
binding to p185H^^2, its growth effect on SK-BR-3 cells was dependent on 30 kD 
concentration (stimulatory at tow concentrations and inhibitory at higher concentrations). 

15 Furthemiore, it stimulated the growth of MDA-MB-468 cells (EGF-R positive, p185HER2 
negative), it stimulated phosphosylation of the EGF receptor and it coukJ be obtained from SK- 
BR-3 cells. In the rat neu system, Yarden et a!., {Biochemistry, 30:3543-3550, 1991) describe 
a 35 kDa glycoprotein candidate ligand for the neu encoded receptor secreted by ras 
transformed fibroblasts. Dobashi et a/., Proc, Natl, Acad. Sci. USA, 88:8582-8586 (1991); 

20 Biochem. Biophys. Res, Commun.\ 179:1536-1542 (1991) described a neu protein-specific 
activating factor (NAF) which is secreted by human T-cell line ATL-2 and which has a 
molecular weight in the range of 8-24 kD. A 25 kD ligand from activated macrophages was 
also described (Tarakhovsky, etal, J. Cancer Res,, 2188-2196 (1991). 

Methods for the in vivo assay of tumors using HER2 specific monoctonal antibodies 

25 and methods of treating tumor cells using HER2 specific monoctonal antibodies are described in 
W089A)6692. 

There is a current and continuing need in the art to kJentify the actual ligand or ligands 
that activate p^B5^^^^, and to klentify their biological role(s), including their roles in cell- 
growth and differentiatton, cell-transformatton and the creatton of malignant neoplasms. 
30 Accordingly, it is an object of this inventton to klentify and purify one or more novel 

p185^<ER2 ligand polypeptkJe(s) that bind and stimulate p185HER2. 

It is another object to provkte nucleic acid encoding novel p185HER2 binding ligand 
polypeptides and to use this nudeto acto to produce a pISS^^^ binding ligand polypeptide in 
recombinant cell culture for therapeutk: or diagnostic use, and for the production of therapeutic 
35 antagonists for use in certain metabolic disorders including, but not necessarily restricted to 
the killing, inhibitton and/or diagnostto imaging of tumors and tumorigenic cells. 

It is a further object to provkle derivatives and modified fomis of novel glycoprotein 
ligands, including amino acid sequence variants, fuston polypeptides combining a pi 85^^^^ 
binding ligand and a heterotogous protein and covalent derivatives of a p^B5^^^ binding ligand 



wo 92/20798 



PCr/US92/04295 



4 

It is an additional object to prepare immunogens for raising antibodies against 
pIBS^^l^ binding ligancte. as well as to obtain antibodies capable of binding to such ligands. and 
antibodies which bind a pISS^^^ binding ligand and prevent the iigand from activating 
p1B5HER2. tt is a further object to prepare immunogens comprising a p185^^ER2 binding ligand 
5 fused with an inrununogenic heterologous polypeptide. 

These and other objects of the invention will be apparent to the ordinary artisan upon 
consideration of the specification as a whole* 

SUMMARY OF THE INVEfJTjON 
10 in accordance with the objects of this invention, we have identified and isolated novel 

iigand families which bind to pISS^ER^. These ligands are denommted the heregulin (HRG) 
polypeptides, and include HRG-a, HRG-pi, HRG-P2, HRG-p3 and other HRG polypeptides 
which cross-react with antibodies directed against these family members and/or which are 
substantially homologous as defined jofQ. A prefered HRG is the ligand disclosed in Fig. 4 
15 and its fragments, further designated HRG-a. Other prefeaed HRGs are the ligands and 
their fragments disclosed in Figure 8. and designated HRG-pl, HRG-p2 disclosed in Figure 
12, and HRG-P3 disdosed in Figure 13. 

In another aspect, the invention provides a composition comprising HRG which is 
isolated from its source environment, in particular HRG that is free of contaminating human 
20 polypeptides. HRG is purified by absorption to heparin sepharose. cation (e.g. potyaspartic 
acid) exchange resins, and reversed phase HPLC. 

HRG or HRG fragments (which also may be synthesized by in vitro methods) are 
fused (by recombinant expression or an ia peptidyl bond) to an invnunogenic polypeptide 
and this fusion polypeptide, in tum, is used to raise antibodies against an HRG epitope. Anti- 
25 HRG antibodies are recovered from the serum of immunized animals. Attematively. 
monoclonal antibodies are prepared from cells in vitro or from m vivo immunized animals in 
conventional fashion. Preferred antibodies identified by routine screening will bind to HRG, but 
will not substantially cross-react with any other known ligands such as EGF, and will prevent 
HRG from activating pISS^^^^. In addition, anti-HRG antibodies are selected that are 
30 capable of binding specifically to individual family members of the HRG family, e.g. HRG-a, 
HRG-pl, HRG-P2, HRG-p3, and thereby may act as specific antagonists thereof. 

HRG also is derivatized in vitro to prepare immobilized HRG and labeled HRG, 
particulariy for purposes of diagnosis of HRG or its antibodies, or for affinity purification of 
HRG antibodies, immobilized anti-HRG antibodies are useful in the diagnosis {in vitro or in 
35 vivo) or purification of HRG. In one preferred embodiment, a mbcture of HRG and other 
peptides is passed over a column to whichJhe anti-HRG antibodies are bound. 

Substitutional, deletional, or inseriional variants of HRG are prepared by in vitro or 
recombinant methods and screened, for example, for immuno-crossreactivity with the native 
forms of HRG and for HRG antagonist or agonist activity. 
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In another preferred embodinoent, HRG is used for stimulating the activity of 
p185H£^ in normal cells, in another preferred embodin^ent, a variant of HRG is used as an 
antagonist to inhtoil stimulation of p185^<ER2. 

HRG. its derivatives, or its antibodies are fomiulated into physiologically acceptable 
5 vehicles, especially for therapeutic use. Such vehicles include sustained-release fomnuiations 
of HRG or HRG variants. A composition is also provided comprising HRG and a 
pharmaceutically acceptable carrier, and an solated polypeptide comprising HRG fused to a 
heterologous polypeptide. 

tn still other aspects, the invention provides an isolated nucleic acid encoding an HRG, 
10 which nucleic acid may be labeled or unlabeled with a detectable moiety, and a nucleic acid 
sequence that is complementary, or hybridizes under stringent conditions to, a nucleic acid 
sequence encoding an HRG. 

The nucleic acid sequence is also useful in hybridization assays for HRG nucleic acid 
and in a method of determining the presence of an HRG, comprising hybridizing the DNA (or 
15 RNA) encoding (or complementary to) an HRG to a test sample nucleic acid and determining 
the presence of an HRG. The invention also provides a method of amplifying a nucleic acid 
test sample comprising priming a nucleic acid polymerase (chain) reaction with nucleic acid 
(DNA or RNA) encoding (or complementary to) a HRG. 

In still further aspects, the nucleic acid is DNA and further comprises a replicable 
20 vector comprising the nucleic acid encoding an HRG operabty linked to control sequences 
recognized by a host transformed by the vector; host cells transformed with the vector; and a 
method of using a nucleic acid encoding an HRG to effect the production of HRG, comprising 
expressing HRG nucleic acid in a culture of the transformed host ceils and recovering an HRG 
from the host cell culture. 
25 In further embodiments, the invention provides a method for producing HRG comprising 

inserting into the DNA of a cell containing the nucleic acid encoding an HRG a transcription 
modulatory element in sufficient proximity and orientation to an HRG nucleic acid to influence 
(suppress or stimulate) transcription thereof, with an optional further step comprising culturing 
the cell containing the transcription modulatory element and an HRG nucleic acid. 
30 in still further embodiments, the invention provides a cell comprising the nucleic acid encoding 

an HRG and an exogenous transcription modulatory element in sufficient proximity and orientation to 
an HRG nucleic acid to influence transcription thereof; and a host cell containing the nucleic acid 
encoding an HRG operabty linked to exogenous control sequences recognized by the host cell. 
BRIEF DESCRIPT ION OF THE DRAWINGS 
35 Figure 1 Purifk^ation of Heregulin on PotyAspartic Acid column. 

PolyAspartk: acid column chromography of heregulin-a was conducted and the elution 
profile of proteins measured at A214. The 0.6 M NaCI pool from the heparin Sepharose 
purifk:ation step was diluted to 0.2 M NaCI with water and loaded onto the polyaspartic acid 
column equilibrated in 17 mM Na phosphate, pH 6.8 with 30% ethanol. A Ihear NaCI gradient 
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from 0.3 to 0.6 M was initiated at 0 time and was complete at 30 minutes. Fractions were 

tested in HRQ tyrosine autophosphorylation assay. The fractions corresponding to peak C 

were pooled for further purification on C4 reversed phase HPLC. 

Hgure 2 C4 Reversed Phase Purification of Heregulin-2. 
5 Panel A: Pool C from the polyaspartic acid column was applied to a C4 HPLC 

column (SynChropak RP-4) equilibrated in 0.1% TFA and the proteins eiuted with a 
linear acetonitrile gradient at 0.25%/minute. The absorbance trace for the run 
numbered C4-17 is shown. One milliliter fractions were collected for assay. 
Panel B: Ten microliter aiquots of the fractions were tested in HRG tyrosine 
10 autophosphorylatk)n assay. Levels of phosphotyrosine in the pISS^^^^^ protein were 

quantitated by a specific antiphosphotyrosine antibody and displayed in arbitrary 
units on the abscissa. 

Panel C: Ten microliter fractions were taken and subjected to SDS gel 
electrophoresis on 4-20% acrylamide gradient gels according to the procedure of 

15 Laenvnli {Nature, 227:680-685. 1970). The molecular weights of the standard proteins 

are indicated to the left of the lane containing the standards. The major peak of 
tyrosine phosphorylation activity found in fraction 17 was associated with a 
prominent 45,000 Da band (HRQ-a). 
Hgure 3. SDS Polyacrylamide Gel Showing Purificatbn of Heregulin-a. 

20 Molecular weight markers are shown in Lane 1. Aliquots from the MDA-MB-231 

conditbned media (Lane 2), the 0.6M NaCI pool from the heparvi Sepharose column (Lane 3), 
Pool C from the polyaspartic ackJ column (Lane 4) and Fraction 17 from the HPLC column 
(C4-17) (Lane 5) were electrophoresed on a 4-20% gradient gel and silver stained. Lanes 6 
and 7 contained buffer only and shows the presence of gel artifacts in the 50-65 KDa 

25 molecular weight region. 

Figures 48-4d depkn the deduced amino add sequence of the cDNA contained in >.gtioher16 
(SEQ ID N0:12 and SEQ ID N0:13). The nucleotides are numbered at the top left of each line 
and the amino ackls written in three letter code are numbered at the bottom left of each line. 
The nucleotkie sequence corresponding to the probe is nucleotides 681-720. The probable 

30 transmembrane domain is amino acids 287-309. The six cysteines of the EGF motif are 226, 
234, 240, 254, 256 and 265. The five potential three-amino ackl N-linked glycosylation sites 
are 164-166, 170-172. 208-210. 437-439 and 609-611. The serine-threonine potential 0- 
glycosylation sites are 209-221. Serine*glycine dipeptkte potential glycosaminoglycan addition 
sites are amino ackls 4243, 64-65 and 15M52. The initiating methionine(MET) is at position 

35 #45 of figure 4 although the processed N-terminal reskiue is S46. 

Figure 5 Northern blot analysis of MDA-MB-231 and SKBR3 Rf^s Labeled from left to 
right are the following: 1) MDA-MB-231 polyA minus-RNA. (RNA remaining after polyA- 
containing RNA is removed); 2) MDA-MB-231 polyA plus-mRNA (RNA which contains potyA); 
3) SKBR3 polyA minus-RNA; and, 4) SKBR3 potyA plus-mRNA. The probe used for this 
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analysis was a radioactively (?^P) labelled internal xho1 DNA restriction endonuclease 

fragment from the cDNA portion of Xgt10her16. 

Figure 6 Sequence Comparisons In the EGF Family of Proteins. 

Sequences of several EGF-like proteins (SEQ ID NOS: 14. 15, 16, 17, 18, and 19) 

5 around the cysteine domain are aligned with the sequence of HRQ-a. The location in figure 6 
of the cysteines and the invariant glycine and arginine residues at positions 238 and 264 
clearly show that HRG-a is a member of the EGF family. The region in figure 6 of highest 
amino acid identity of the family members relative to HRG-a (30-40%) Is found between Cys 
234 and Cys 265. The strongest identity (40%) is with the heparin-binding EGF (HB-EGF) 

10 species. HRG-a has a unique 3 amino acid insert between Cys 240 and Cys 254. Potential 
transmembrane doniains are boxed (287-309). Bars indicate the carboxy-terminal sites for 
EGF and TGF-alpha where proteolytic cleavage detaches the mature growth factors from 
their transmembrane associated preforms. HB-EGF is heparin binding-epidennal growth 
factor; EGF is epidemfial growth factor; TGF-alpha is transfomiing growth factor alpha; and 

15 schwannoma is the schwannoma-derlved growth factor. The residue nunibers in Fig. 6 reflect 
the Fig. 4 convention. 

Figure 7 Stimulation of Cell Growth by HRG-a. 

Three different cell lines were tested for growth responses to 1 nM HRG-a. Cell 
protein was quantitated by crystal violet staining and the responses normalized to control, 
2D untreated cells. 

Figures 8a-8d (SEQ ID N0:7) depict the entire potential coding DNA nucleotide sequence of the 
heregulln-pl and the deduced amino acid sequence of the cDNA contained in Xher ll.ldbl 
(SEQ ID N0:9). The nucleotides are numbered at the top left of each line and the amino acids 
written in three letter code are numbered at the bottom left of each line. The probable 

25 transmembrane amino acid domain is amino acids 278-300. The six cysteines of the EGF 
motif are 212, 220, 226, 240. 242 and 251. The five potential three-amino acid N-linked 
glycosylation sites are 150-152. 156-158, 196-198, 428-430 and 600-612. The serine-threonine 
potential 0-glycosylation sites are 195-207. Serine-glycine dipeptide potential 
glycosaminoglycan addition sites are amino acids 28-29, 50-51 and 137-138. The initiating 

30 methionine (MET) is at position #31 . HRG-pl is processed to the N-terminal residue S32. 
Figure 9 depicts a comparison of the amino acid sequences of heregulin-a and -pi . A dash (-) 
indicates no amino acid at that position. (SEQ ID N0:8 and SEQ ID N0:9). This Fig. uses the 
numbering convention of Figs. 4 and 6. 

Figure 10 shows the stimulation of HER2 autophosphoiylation using recombinant HRG-a as 
35 measured by HER2 tyrosine phosphorylation. 

Figure 11 depicts the nucleotide and inputed amino acid sequence of XI5'her13 (SEQ ID NO:22); 
the amino acid residue numbering convention is unique to this figure. 
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Figure 12a-12e depict the nucleotide sequence of Xher76. encoding heregulin-p2 (SEQ ID 
N023). This figure conrvnences amino acid residue numbering with the expressed N-terminal 
MET; the N-terminus is S2. 

Figures 13a-13c depict the nucleotide sequence of Xher78, encoding heregulin*p3 (SEQ ID 
5 N054). This figure uses the amino acid numbering convention of Fig. 12; S2 is the processed 
N-temninus. 

Figures 14a-14d depict the nucleotide sequence of Xher84. encoding a heregulin-p2-iike 
polypeptide (SEQ ID N025). This figure uses the amino acid numbering convention of Fig. 12; 
S2 is the processed N-terminus. 
10 Figure 15d-15c depict the amino acid homologies between the known heregulins (a, pi, ^2, p2-like 
and p3 in descending order) and illustrates the amino ackl insertions, delettons or substitutions that 
distinguish the different fomns (SEQ ID NOS;26-30). This figure uses the amino acid numbering 
conventkMi of Figs. 12-14. 



15 DETAILED DESCR IPTION OF THE PREFERRED EMBODIMENTS 

I DeBnitk)ns 

In general, the folbwing words or phrases have the indicated definition when used in 
the description, examples, and claims. 

Heregulin ('HRG') defined herein to be any isolated polypeptide sequence which 

2D possesses a biological activity of a polypeptide disclosed in Figs. 4, 8, 12. 13. or 15, and 
fragments, alleles or animal anatogues thereof or their animal anatogues. HRG excludes any 
polypeptide heretofore identified, including any known polypeptide which is otherwise 
anticipatory under 35 U.S.C. 102. as well as polypeptides obvious over such known 
polypeptides under 35 U.S.C. 103, including in particular EFG. TFG-a. amphiregulin (Plowman 

25 etai Mol. Cell. BioL 10:1969 (1990). HB-EGF (Higashimaya efa/.. Sc/ence 251:936 [1991]), 
schwannoma factor or polypeptkles obvious thereover. 

'Biological activity' for the purposes herein means an in vn^o.effector or antigenic 
functk)n that is directly or indirectly perlormed by an HRG polypeptide (whether in its native 
or denatured confonnation). or by any subsequence thereof. Effector functions include 

30 receptor binding or activation, induction of differentiation, mitogenic or growth promoting 
activity. Immune modulation. DNA regulatory functbns and the like, whether presently known 
or inherent. Antigenic functions include possession of an epitope or antigenic site that is 
capable of cross-reacting with antibodies raised against a naturally occuaing or denatured 
HRG polypeptkle or fragment thereof. 

35 Bk)k)gk:ally active HRG includes polypeptides having both an effector and antigenic 

function, or only one of such functions. HRG includes antagonist polypeptides to HRG, 
provided that such antagonists include an epitope of a native HRG. A principal known 
effector functk)n of HRG is its ability to bind to p185^^E"2 and activate the receptor tyrosine 
kinase. 
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HRG includes the translated amino acid sequence of full length human HRGs 
(proHRG) set forth herein in the Figures; deglycosylated or unglycosylaled derivatives; amino 
acid sequence variants; and covaient derivatives of HRG, provided that they possess 
biological actvity. While the native proform of HRG is probably a membrane-bound 

5 polypeptide, soluble forms, such as those fornis lacking a functional transmembrane domain 
(proHRG or its fragments), are also included within this definition. 

Fragments of intact HRG are included within the definition of HRG. Two principal 
domains are included within the fragments. These are the growth factor domain rGFD'), 
homologous to the EGF family and located at about residues S216-A227 to N268-R286 (Fig. 9, 

10 HRG-a; the GFD domains for other HRGs (Fig. 15) are the homologous sequences.). 
Preferably, the GFDs for HRG-o, pi. P2. PHike and P3 are, respectively, G175-K241. G175- 
K246, G175-K238. G175-K238 and G175-E241 (Fig. 15). 

Another fragment of interest is the N-tenninal domain ('NTD'). The NTD extends 
from the N-temriinus of processed HRG (S2) to the residue adjacent to an hJ-terminal reskJue 

15 of the GFD, i.e., about T172-C182 (Fig. 15) and preferably T174. An additional group of 
fragments are NTO-GFO domains, equivalent to the extracellular domains of HRG-a and pi- 
P2- Another fragment is the C-tenninal peptide ('CTP') k)cated about 20 reskiues N-temiinal 
to the first residue of the transmembrane domain, either alone or in combination with the C- 
tenninal remainder of the HRG. 

20 In preferred embodiments, antigenk:ally active HRG is a polypeptide that binds with an 

affinity of at least about 10^ I/mole to an antbody raised against a naturally occurring HRG 
sequence. Ordinarily the polypeptkle binds with an affinity of at least about 10^ I/mole. Most 
preferably, the antigenically active HRG is a polypeptide that binds to an antibody raised 
against one of HRGs in its native conformation. HRG in Hs native conformation generally is 

25 HRG as found in nature which has not been denatured by chaotropc agents, heat or other 
treatment that substantially nwdifies the three dimenstonal structure of HRG as determined, 
for example, by migration on nonreducing, nondenaturing sizing gels. Antibody used in this 
determinatk>n is rabbit polyctonal antibody raised by formulating native HRG from a non- 
rabbit species in Freund's complete adjuvant, subcutaneously injecting the fomnulation into 

30 rabbits, and boosting the immune response by intraperitoneal injection of the formulation until 
the titer of anti-HRG antibody plateaus. 

Ordinarily, biologically active HRG will have an amino acid sequence having at least 
75% amino ackf sequence kJentity with an HRG sequence, more preferably at least 80%, even 
more preferably at least 90%, and most preferably at least 95%. kJentity or homology with 

35 respect to an HRG sequence is defined herein as the percentage of amino acid residues in the 
candidate sequence that are kfentical with HRG residues in Figs. 15, after aligning the 
sequences and introducing gaps, if necessary, to achieve the maximum percent homology, and 
not considering any conservative substitutuns to be klentical reskiues. None of N-terminal. 
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C-terminal or internal extensions, deletions, or insertions into HRG sequence shall be constmed 
as affecting honx)logy. 

Thus, the biologically active HRG polypeptides that are the subject of this inventton 
include each expressed or processed HRG sequence; fragments thereof having a consecutive 
5 sequence of at least 5, 10. 15, 20. 25, 30 or 40 amino acid residues; amino acid sequence 
variants of HRG wherein an amino acid residue has been inserted N* or C-temninal to, or 
within. HRG sequence or its fragment as defined above; amino acid sequence variants of HRG 
sequence or its fragment as defined above wherein a residue has been substituted by another 
residue. HRG polypeptides include those containing predetermined mutations by. e.g.. site- 

10 directed or PGR mutagenesis. HRG includes HRG from such as species as rabbit, rat. 
porcine. non*human primate, equ^e. murine, and ovine HRG and alleles or other naturally 
occun^ing variants of the foregoing; derivatives of HRG or its fragments as defined above 
wherein HRG or Its fragments have been covalently modified by substitution, chemical, 
enzymatic, or other appropriate means with a nx>iety other than a naturally occuning amino 

15 acid (for example a detectable moiety such as an enzyme or radioisotope); glycosylation 
variants of HRG (insertion of a glycosylation site or deletion of any glycosylation site by 
deletion, insertion or substitution of an appropriate residue); and soluble fomns of HRG, such 
as HRG-GFD or those that lack a functional transmembrane domain. 

Of particular interest are fusion proteins that contain HRG-NTO but are free of the 

20 GFD ordinarily associated with the HRG-NTD in question. The first 23 amino acids of the 
NTD are dominated by charged residues and contain a sequence (GKKKER; residues 13-18, 
Fig. 15) that closely resembles the consensus sequence motif for nuclear targeting (Roberts. 
Biochim. Biophys. Acta. lfiQ8:263 [1989]). Accordingly, the HRG includes fusions in which the 
NTD, or at least a polypeptide comprising its first about 23 residues, is fused at a terminus 

25 to a non-HRG polypeptide or to a GFD of another HRG family member The non-HRG 
polypeptide in this embodiment is a regulatory protein, a growth factor such as EGF or TGF- 
a, or a polypeptide ligand that binds to a cell receptor, particulariy a cell surface receptor 
found on the surface of a cell whose regulation is desired, e.g. a cancer cell. 

In another embodiment, one or more of residues 13*18 independently are varied to 

30 produce a sequence incapable of nuclear targeting. For example G13 is mutated to any other 
naturally occurring residue including P. L, I, V. A, M, F, K. D or S; any one or nx)re of K14-K16 
are mutated to any other naturally occurring residue including R.H.D.E.N or Q; El 7 to any 
other naturally occurring residue including D, R. K. H, N or Q; and R18 to any other naturally 
occuning residue including K, H. D, E, N or 0. All or any one of residues 13-18 are deleted as 

35 well, or extraneous residues are inserted adjacent to these residues; for example residues 
inserted adjacent to residue 13-18 which are the same as the above- suggested substitutions 
for the residues themselves. 

In another embodiment, enzymes or a nuclear regulatory protein such as a 
transcriptional regulatory factor is fused to HRG-NTD. HRG-NTD-GFD, or HRG-GFD. The 
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enzyme or factor is fused to tlie N- or C- terminus, or inserted between the NTD and GFD 
domains, or is substituted for the region of NTD between the first about 23 residues and the 
GFD. 

'Isolated' HRG means HRG which has been identified and Is free of components of its 

5 natural environment. Contaminant components of its natural environment Include materials 
which would intertere with diagnostic or therapeutic uses for HRG, and may Include proteins, 
hormones, and other substances. In prefen-ed embodiments, HRG will be purified (1) to 
greater than 95% by weight of protein as detennined by the Lowry method or other validated 
protein determination method, and most preferably more than 99% by weight, (2) to a degree 

10 sufficient to obtain at least 15 residues of N-terminal or internal amino acid sequence by use 
of the best commercially available amino acid sequenator marketed on the filin- date hereof, 
or (3) to homogeneity by SDS-PAGE using Coomassie blue or, preferably silver stain. 
Isolated HRG includes HRG in sHu within heterologous recombinant cells since at least one 
component of HRG natural environment will not be present. Isolated HRG includes HRG from 

15 one species in a recombinant cell culture of another species since HRG in such circumstances 
will be devoid of source polypeptides. Ordinarily, however, isolated HRG will be prepared by 
at least one purification step. 

in accordance with this invention, HRG nucleic acid is RNA or DNA containing greater 
than ten bases that encodes a biologically or antigenicaily active HRG, is complementary to 

2D nucleic acid sequence encoding such HRG, or hybridizes to nucleic acid sequence encoding such 
HRG and remains stably bound to it under stringent conditions. 

Preferably, HRG nucleic acid encodes a polypeptide sharing at least 75% sequence 
identity, more preferably at least 80%, still more preferably at least 85%, even more 
preferably at 90%, and most preferably 95%, with an HRG sequence. Preferably, the HRG 

25 nucleic acid that hybridizes contains at least 20, more preferably at least about 40, and most 
preferably at least about 90 bases. Such hybridizing or complementary nucleic acid, however, 
is further defined as being novel under 35 U.S.C. 102 and unobvious under 35 U.S.C. 103 over 
any prior art nucleic acid and excludes nucleic acid encoding EGF, TGF-a, amphiregulin, HB- 
EGF, schwannoma factor or fragments or variants thereof which would have been obvious as 

30 of the filing date hereof. 

Isolated HRG nucleic acid includes a nucleic acid that is free from at least one 
contaminant nucleic acid with which it is ordinarily associated in the natural source of HRG 
nucleic acid. Isolated HRG nucleic acid thus is present in other than in the form or setting in 
which it is found in nature. However, isolated HRG encoding nucleic acid hdudes HRG nucleic 

35 acid in ordinarily HRG-expressing cells where the nucleic acid is in a chromosomal location 
different from that of natural cells or is othenvise flanked by a different DNA sequence than 
that found in nature. Nucleic acid encoding HRG may be used in specific hybridization assays, 
particulariy those portions of HRG encoding sequence that do not hybridize with other known 
DNA sequences, for example those encoding the EGF-like molecules of figure 6. 
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•Stringent conditions* are those that (1) employ low Ionic strength and high 

temperature for washing, for example, 0.015 M NACI/0.0015 M sodium citrate/0/1% 
NaDodSO^ at 50° C; (2) employ during hybridization a denaturing agent such as formamide, 

for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin, 0.1% Ficoll, 0.1% 

5 polyvinylpyrrolidone, 50 mM sodium phosphate buffer at pH 6.5 with 750 mM NaCI, 75 mM 
sodium citrate at 42" C; or (3) employ 50% formamide, 5 x SSC (0.75 M NaCI, 0.075 M 
sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5 x 
Denhardt's solution, sonicated salmon sperm DNA (50 g/ml). 0.1% SDS, and 10% dextran 
sulfate at 42'C, with washes at 42*C in 0.2 x SSC and 0.1% SDS. 

10 Particular HRG-a nucleic adds are nucleic acids or oligonucleotides consisting of or 

connprising a nucleotide sequence selected from Figs. 4a-4d and containing greatrrthan 17 
bases (when excluding nucleic acid sequences of human small poiydisperse circi ^r DNA 
(HUMPC125), chicken c-mos proto-oncogene honrolog (CHKMOS), basement nrembrane 
heparin sulfate proteoglycan (HUMBMHSP) and human lipocortin 2 pseudogene (complete cds- 

15 like region, HUMLIP2B), ordinarily greater than 20 bases, preferably greater than 25 bases, 
together with the complementary sequences thereof. 

Particular HRG-pi, -^2 or -pa nucleic ackJs are nucleic acids or oligonucleotkJes 
consisting of or comprising a nudeotkJe sequence selected from Figs. 8a-8d, 12a-12e or 13a-13c 
and containing greater than 20 bases, but does not include the polyA sequence found at the 3' 

20 end of each gene as noted In the Figures, together with the complements to such sequences. 
Preferably the sequence contains contains greater than 25 bases. HRG-^ sequences also 
may exclude the human small poiydisperse circular DNA sequence (HLIMP-C125). 

In other embodiments, the HRG nudeotkle sequence contains a 15 or more base HRG 
sequence and is selected from within the sequence encoding the HRG domain extending from 

25 the N-tenninus of the GFD to the N-terminus of the transmembrane sequence (or the 
complement of that nucleic ackl sequence). For example, with respect to HRG-a, the 
nucleotkle sequence Is selected from within the sequence 678-869 (Fig. 4b) and contains a 
sequence of 15 or more bases from this sectk>n of the HRG nucleic acid. 

In other embodiments, the HRG nuciek: ackj sequence Is greater than 14 bases and is 

30 selected from a nucleotide sequence unque to each subtype, for instance a nucleic acid 

sequence encoding an amino acid sequence that is unk]ue to each of the HRG subtypes (or the 
complement of that nuciek: ack) sequence). These sequences are useful in diagnostic assays 
for expressk>n of the various subtypes, as well as specific amplificatton of the subtype DNA. 
For exaniple, the HRG-a sequence of ffiterest woukj be selected from the sequence encoding 

35 the unkiue N-terminus or GFD-transmen*rane joining sequence, e.g. about bp771-860. 

Similariy. a unk]ue HRG-^i sequence is that which encodes the last 15 C-temiinal amino acid 

reskJues; this sequence Is not found in 

HRGkx. 
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In general, the length of the HRG-a or p sequence beyond greater than the above- 
indicated nuniber of bases is ffnmaterial snce all of such nucleic acids are useful as probes or 
amplification primers. The selected HRG sequence may contain additional HRG sequence, 
either the nomial flanking sequence or other regions of the HRG nucleic acid, as well as other 

5 nucleic acid sequences. For purposes of hybridization, only the HRG sequence is material 

The term 'control sequences' refers to DNA sequences necessary for the expression 
of an operably linked coding sequence in a partk:ular host organism. The control sequences 
that are suitable for prokaryotes, for example, include a promoter, optionally an operator 
sequence, a rbosome binding site, and possibly, other as yet poorly understood sequences. 

10 Eukaryotic cells are known to utilize promoters, potyadenylation signals, and enhancers. 

Nucleic ackJ is 'operably linked' when it is placed into a functbnal relatbnship with 
another nucleic acid sequence. For example, DNA for a presequence or secretory leader is 
operably linked to DNA for a polypeptkJe if it is expressed as a preprotein that participates in 
the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding 

15 sequence if it affects the transcription of the sequence; or a ribosome binding site is operably 
linked to a coding sequence if it is positioned so as to facilitate translation. Generally, 
'operably linked' means that the DNA sequences being linked are contiguous and, in the case 
of a secretory leader, contiguous and in reading phase. However enhancers do not have to be 
contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites do 

2D not exist, then synthetic oligonucleotide adaptors or linkers are used in accord with 
conventional practice. 

An 'exogenous' element is defined herein to mean nuciek: acid sequence that is foreign 
to the cell, or honrwtogous to the cell but in a position within the host cell nucleic acid in which 
the element is ordinarily not found. 

25 As used herein, the expressions 'cell", 'cell line', and 'cell culture' are used 

interchangeably, and all such designattons include progeny. Thus, the words transformants' 
and transformed cells' include the prvnary subject cell and cultures derived therefrom without 
regard for the number of transfers. It is also understood that all progeny may not be precisely 
klentk:al in DNA content, due to deliberate or inadvertent mutations. Mutant progeny that 

30 have the same function or biotogical activity as screened for in the originally transfomfied cell 
are included. It will be clear from the context where distinct designations are intended. 

'Plasmids' are designated by a tower case 'p' preceded and/or folbwed by capital 
letters and/or numbers. The starting plasmids herein are commercially available, are publicly 
available on an unrestricted basis, or can be constructed from such available plasmids in 

35 accord with published procedures. In addition, other equivalent plasmids are known in the art 
and will be apparent to the ordinary artisan. 

"Restrictton Enzyme Digestion' of DNA refers to catalytic cleavage of the DNA with 
an enzyme that acts only at certain locations in the DNA. Such enzymes are called 
restrictk)n endonucleases. and the sites for which each is specific is called a restriction site. 
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The various restriction enzymes used herein are commercially available and their reaction 
conditions, cofactors, and other requirenients as established by the enzyme suppliers are used. 
Restriction enzymes comnrK)niy are designated by abbreviations composed of a capital letter 
followed by other letters representing the microorganism from which each restriction enzyme 
5 originally was obtained, and then a number designating the particular enzyme. In general, 
about 1 ^g of plasmid or DMA fragment is used with about 1-2 units of enzyme in about 20 ^1 
of buffer solution. Appropriate buffers and substrate amounts for particular restriction 
enzymes are specified by the manufacturer, incubation of about 1 hour at 37^C is ordinarily 
used, but may vary in accordance with the supplier's instructions. After incubation, protein or 

10 polypeptide is renDoved by extraction with phenol and chloroform, and the digested nucleic acid 
is recovered from the aqueous fraction by precipitation with ethanol. Digestion with a 
restriction enzyme may be followed with bacterial alkaline phosphatase hydrolysis of the 
terminal 5* phosphates to prevent the two restriction cleaved ends of a DNA fragment from 
"circularizing" or forming a closed loop that would impede insertion of another DNA fragment 

15 at the restriction site. Unless othenvise stated, digestion of plasmids is not followed by 5' 
terminal dephosphorylation. Procedures and reagents for dephosphorylation are conventional 
as described in sections 1.56-1.61 of Sambrook et ai, (Molecular Cbning: A Laboratory Manual 
New Yoric CokJ Spring Harbor Laboratory Press. 1989). 

'Ligation' refers to the process of fomning phosphodiester bonds between two nucleic 

20 acid fragments. To ligate the DNA fragments together, the ends of the DNA fragments must 
be compatible with each other. In some cases, the ends will be directly compatible after 
endonuclease digestion. However, it may be necessary to first convert the staggered ends 
commonly produced after endonuclease digestion to blunt ends to nnake them compatible for 
ligation. To blunt the ends, the DNA is treated in a suitable buffer for at least 15 minutes at 

25 15''C with about 10 units of the Klenow fragment of DNA polymerase I or T4 DNA 
polymerase in the presence of the four deoxyribonucleotide triphosphates. The DNA is then 
purified by phenol-chloroform extraction and ethanol precipitation. The DNA fragments that 
are to be ligated together are put in soiutk)n m about equimotar amounts. The solution will also 
contain ATP, iigase buffer, and a ligase such as T4 DNA ligase at about 10 units per 0.5 ^g 

30 of DNA. If the DNA is to be ligated into a vector, the vector is first linearized by digestion 
with the appropriate restriction endonuclea$e(s). The linearized fragment is then treated with 
bacterial alkaline phosphatase, or calf intestinal phosphatase to prevent self-ligation during 
the ligation step. 

The technkiue of 'polymerase chain reaction.' or 'PCR/ as used herein generally 
35 refers to a procedure wherein minute amounts of a specific piece of nucleic acid. RNA and/or 
DNA, are amplified as described in U.S. Pat. No. 4.683.195, issued 28 July 1987. Generally, 
sequence infomiation from the ends of the region of interest or beyond needs to be available, 
such that oligonucleotide primers can be designed; these primers will be klentical or similar in 
sequence to opposite strands of the template to be amplified. The 5' terminal nucleotides of 
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the two primers may coincide with the ends of the amplified material. PGR can be used to 
amplify specific RNA sequences, specific DNA sequences from total genomic DNA. and cDNA 
transcribed from total cellular RNA, bacteriophage or plasmid sequences, etc. See generally 
Mullis «f a/.. Cold Spring Harbor Symp. Quant, fi/o/,51: 263 (1987); Erlich. ed., PCR 

5 Technobgy. (StocWon Press. NY. 1989). As used herein. PCR is considered to be one, but 
not the only, example of a nucleic acid polymerase reaction method for amplifying a nucleic 
acid test sample, comprising the use of a known nucleic acid PNA or RNA) as a primer, and 
utilizes a nucleic acid polymerase to amplify or generate a specific piece of nucleic acid or to 
amplify or generate a specific piece of nucleic acid which is con^iementary to a particular 

10 nudeicacid. 

The *HRG tyrosine autophosphorylation assay' to detect the presence of HRG 
ligands was used to monitor the purification of a ligand for the p185H^^ receptor. This assay 
is based on the assumption that a specific ligand for the p^B5^^^^ receptor will stimulate 
autophosphorylation of the receptor, in analogy with EGF and its stimulation of EGF receptor 

15 autophosphorylation. MDA-MB-453 cells or MCF7 cells which contain high levels of pi 85HER2 
receptors but negligible levels of human EGF receptors, were obtained from the American 
Type Culture Collection, Rockville. Md. (ATCC No HTB-131) and maintained in tissue culture 
with 10% fetal calf serum in DMEM/Hams F12 (1:1) media. For assay, the cells were 
trypsnized and plated at about 150,000 cells/well in 24 well dishes (Costar). After incubation 

20 with semm containing media overnight, the cells were placed in senjm free media for 2-18 
hours before assay. Test samples of 100 uL aliquots were added to each well. The cells 
were incubated for 5-30 minutes (typically 30 min) at 37oC and the media removed. The 
cells in each well were treated with 100 uL SDS gel denaturing buffer (Seprosol, Enpotech, 
Inc.) and the plates heated at 100°C for 5 minutes to dissolve the cells and denature the 

25 proteins. Aliquots from each well were electrophoresed on 5-20% gradient SDS gels (Novex, 
Encinitas, CA) according to the manufacturer's directions. After the dye front reached the 
bottom of the gel, the electrophoresis was terminated and a sheet of PVDF membrane 
(ProBlott, ABI) was placed on the gel and the proteins transferred from the gel to the 
membrane in a blotting chamber (BioRad) at 200 mAmps for 30-60 min. After blotting, the 

30 membranes were incubated with Tris buffered saline containing 0.1% Tween 20 detergent 
buffer with 5% BSA for 2-18 hrs to block nonspecific binding, and then treated with a mouse 
anti-phosphotyrosine antibody (Upstate Biological Inc., N.Y.). Subsequently, the membrane 
blots were treated with goat anti-mouse antibody conjugated to alkaline phosphatase. The 
gels were developed using the ProtoBtot System from Promega. After drying the membranes, 

35 the density of the bands corresponding to p185"Ef'2 jp each sample lane was quantitated with 
a Hewlett Packard ScanJet Plus Scanner attached to a Macintosh computer. The number 
of receptors per cell in the MDA-MB-453 or MCF-7cells is such that under these experimental 
conditions the p185"^'^ receptor protein is the major protein which is labeled. 
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'Protein microsequencing' was acx^omplished based upon the following procedures. 
Proteins from the final HPLC step were either sequenced directly by autonnated Edman 
degradation with a nDode! 470A Applied Biosystems gas phase sequencer equipped with a 
120A PTH amino acid analyzer or sequenced after digestion with various chemicals or 
5 enzymes. PTH amino acids were integrated using the ChromPerfect data system (Justice 
Innovations, Pak) Alto. CA). Sequence interpretation was performed on a VAX 11/785 Digital 
Equipment Corporation computer as described (Henzel afa/., J. Chromatography 404:41-52 
(1987)). In some cases, aliquots of the HPLC fractions were electrophoresed on 5-20% SDS 
polyacrylamide gels, electrotransferred to a PVDF membrane (ProBlott. ABI, Foster City. 

10 CA) and stained with Coomassie Brilliant Blue (Matsudaira. P.. J. BbL Cham. 262:10035- 
10038. 1987). The specific protein was excised from the blot for N terminal sequencing. To 
determine internal protein sequences. HPLC fractions were dried under vacuum (SpeedVac), 
resuspended in appropriate buffers, and digested with cyanogen bromide, the lysine-specific 
enzyme Lys-C (Wako Chemicals, Richmond. VA) or Asp-N (Boehringer Mannheim, 

15 Indianapolis, Ind.). After digestion, the resultant peptkles were sequenced as a mixture or 
were resolved by HPLC on a C4 column developed with a propanol gradient fr) 0.1% TFA 
before sequencing as described above. 

II. USE AND PREPARATION OF HRG POLYPEPTIDES 
20 1. PREPARATION OF HRG POLYPEPTIDES INCLUDING VARIANTS 

The system to be employed in preparing HRG polypeptides will depend upon the 

particular HRG sequence selected. If the sequence is sufficiently small HRG is prepared by M 

yilia polypeptide synthetic methods. Most commonly, however, HRG is prepared in 

recombinant cell culture using the host-vector systems deserved below. 

25 In general, mammalian host cells will be employed, and such hosts may or may not 

contain post-translational systems for processing HRG prosequences in the nomnal fashion. If 
the host cells contain such systems then it will be possible to recover natural subdomain 
fragments such as HRG-GFD OR HRG-NTD-GFD from the cultures. If not. then the proper 
processing can be accomplished by transforming the hosts with the required enzyme(s) or by 

30 cleaving the precursor in vitro . However, it is not necessary to transform cells with DNA 
encoding the complete prosequence for a selected HRG when it is desired to only produce 
fragments of HRG sequences such as an HRG-GFD. For example, to prepare HRG-GFD a 
start codon is ligated to the 5' end of DNA encoding an HRG-GFD. this DNA is used to 
transform host cells and the product expressed directly as the Met N-tenninal form (if 

35 desired, the extraneous Met may be removed in vitro or by endogenous N-term.inal 
demethionylases). Alternatively. HRG-GFD is expressed as a fusion with a signal sequence 
recognized by the host cell, which will process and secrete the mature HRG-GFD as is further 
described below. Amino acid sequence variants of native HRG-GFD sequences are produced 
in the same way. 
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HRG-NTD is produced in the same fashion as the full length molecule but from 
expression of DNA encoding only HRG-NTD, with the stop codon after one of S172-C182 (Fig. 
15). 

In addition, HRG variants are expressed from DNA encoding protein in which both the 

5 GFD and NTD domains are h their proper orientation but which contain an amino acid 
insertion, deletion or substitution at the NTD-GFD joining site (for example located within the 
sequence S172-C182. In another embodiment a stop codon is positioned at the 3' end of the 
NTD-GFD-encoding sequence (after any residue T/Q222-T245 of Fig. 15). The result is a 
soluble form of HRG-a or or -p2 which lacks its transmembrane sequence (this sequence 

10 also may be an internal signal sequence but will be refen'ed to as a transmembrane sequence). 
In further variations of this embodiment, an intemal signal sequence of another polypeptide is 
substituted in place of the native HRG transmembrane domain, or a cytoplasmic domain of 
another cell membrane polypeptide, e.g. receptor kinase, is substituted for the HRG-a or HRG 
^1-^2 cytoplasmic peptide. 

15 In a still further embodiment, the NTD. GFD and transmembrane domains of HRG and 

other EGF family members are substituted for one another, e.g. the NTD equivalent region of 
EGF is substituted for the NTD of HRG. or the GFD of HRG is substituted for EGF in the 
processed, soluble profonn of EGF. Altematively, an HRG or EGF family member 
transmembrane domain is fused onto the C-termnal E236 of HRG-ps- 

20 In a further variant, the HRG sequence spanning K241 to the C-terminus is fused at 

its N-terminus to the C-temninus of a non-HRG polypeptide. 

Another embodiment comprises the functional or stnictural deletion of the proteolytic 
processing site in CTP, the GFD-transnwmbrane spanning domain. For example, the putative 
G-terminal lysine (K241) of processed HRG-a or Pi-p2 is deleted, substituted with another 

25 residue, a residue other than K or R inserted between K241 and R242, or other disabling 
mutation is made in the prosequence. 

In another embodiment, the domah of any EGF family member extending from (a) its 
cysteine corresponding to (b) C221 to the G-terminal residue of the family member is 
substituted for the analogous domain of HRG-a or 01-^2 (or fused to the G-terminus of 

30 HRG-ps). Such variants will be processed free of host ceils in the same fashion as the family 
member rather than as the parental HRG. In more refined embodiments other specific 
cleavage sites (e.g. protease sites) are substituted into the CTP or GFD-transmembrane 
spanning domain (about residues T/Q222-T245, Fig. 15). For example, amphiregulin sequence 
E84-K99 or TGFa sequence E44-K58 is substituted for HRG-a residues E223-K241 . 

35 In a further embodiment, a variant (termed HRG-NTDxGFD) is prepared wherein (1) 

the lysine residue found in the NTD-GFD joining sequence VKC (residues 180-182. Figure 15) is 
deleted or (preferably) substituted by another residue other than R such as H, A, T or S and 
(2) a stop codon is introduced in the sequence RCT or RCQ (residues 220-222. Figure 15) in 
place of C. or T (for HRG-a) or Q (for HRG-beta). 
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A preferred HRG-a ligand with binding affinity to p185^^^f^ comprises amino acids 
226-265 of figure 4. This HRG-a ligand further may comprise up to an additional 1-20 amino 
acids preceding amino acid 226 from figure 4 and 1-20 amino acids following amino acid 265 
from figure 4. A prefen-ed HRG-p ligand with binding affinity to pISS^Ef^^ comprises amino 
5 acids 226-265 of figure 8. This HRG-p ligand may comprise up to an additional 1-20 amino 
acids preceding amino acid 226 from figure 8 and 1-20 amino acids following amino acid 265 
from figure 8. 

GFD sequences include those in which one or more residues con^esponding to another 
member of the EGF family are deleted or substituted or have a residue inserted adjacent 
10 thereto. For example. F216 of HRG is substituted by Y. L202 with E. F189 with Y, or S203- 
P205 is deleted. 

HRG also fficludes NTD-GFD having its C-terminus at one of the first about 1 to 3 
extracellular domain residues (QKR, residues 240-243. HRE-a. Figure 15) or first about 1-2 
transmembrane region residues. In addition, in some HRG-GFD variants the codons are 

15 modified at the GFD-transmember proproteolysis site by substitution, insertion or deletion. 
The GFD proteolysis site is the domain that contains the GFD C-terminal residue and about 5 
residues N* and 5 residues C-terminal from this residue. At this time neither the natural C- 
temninal residue for HRG-a or HRG-p has been identified, although it is known that K«et-227 
terminal and Val-229 terminal HRG-a-GFD are biologically active. The native C-terminus for 

2D HRG-a-GFD is probably f^et-227. Lys-228. Val-229. Gln-230, Asn-231 or Gln-232. and for 
HRG P1-P2.GFD is probably Met-226. Ala-227. Ser-228. Phe-229. Trp.230. Lys 231or (for 
HRG-pi) K240 or (for HRG-P2) K246. The native C-terminus is determined readily by C- 
terminal sequencing, although it is not critical that HRG-GFD have the native terminus so long 
as the GFD sequence possesses the desired activity. In some embodiments of HRG-GFD 

25 variants, the amino acid change(s) in the CTP are screened for their ability to resist 
proteolysis in vitro and inhibit the protease responsible for generation of HRG-GFD. 

If it is desired to prepare the full length HRG polypeptides and the 5' or 3' ends of the 
given HRG are not described herein, it may be necessary to prepare nucleic acids in which the 
missing domains are supplied by homologous regions from more complete HRG nucleic acids. 

30 Attemativety. the missing domains can be obtained by probing libraries using the DNAs 
disclosed in the Figures or fragments thereof. 
A. tyotat'onofPNAEncodinQHefeaulin 

The DNA encoding HRG may be obtained from any cDNA library prepared from 
tissue believed to possess HRG mRNA and to express it at a detectable level. HRG DNA 
35 also is obtained from a genomic library. 

Libraries are screened with probes or analytical tools designed to identify the gene of 
Interest or the protein encoded by it. For cDNA expression libraries, suitable probes include 
monoclonal or polyclonal antibodies that recognize and specifically bind to HRG: 
oligonucleotides of about 20-80 bases in length that encode known or suspected portions of 
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HRG cDNA from the same or different species; and/or complementary or homotogous cDNAs 
or fragments thereof that encode the same or a hydridizing gene. Appropriate probes for 
screenffig genomic DNA libraries Include, but are not limited to, oligonucleotides; cONAs or 
fragments thereof that encode the same or hybridizing DNA; and/or honnologous genomic 
5 DNAs or fragments thereof. Screening the cDNA or genomic library with the selected probe 
may be conducted using standard procedures as described in chapters 10-12 of Sambrook et 
aL, supra. 

An alternative means to isolate the gene encoding HRG is to use polymerase chain 
reaction (PCR) methodology as described in section 14 of Sambrook et ai, supra. This 

10 method requires the use of oligonucleotide probes that will hybridize to HRG. Strategies for 
selection of oligonucleotides are described betow. 

Another alternative method for obtaining the gene of interest is to chemically 
synthesize it using one of the methods described in Engels etaL {Agr)ew. Chem. Int Ed £hp/., 
28: 716*734.1989). These methods include triester, phosphite, phosphoramidite and H- 

15 Phosphonate methods, PCR and other autoprimer methods, and oligonucleotide syntheses on 
solid supports. These methods may be used if the entire nucleic acid sequence of the gene is 
known, or the sequence of the nucleic acid complementary to the coding strand is available, or 
altemativety, if the target amino acid sequence is known, one may infer potential nucleic ackl 
sequences using known and preferred coding resklues for each amino ackJ residue. 

20 A preferred method of practicing this invention is to use carefully selected 

oligonucleotide sequences to screen cDNA libraries from various tissues, preferably human 
breast, colon, salivary gland, placental, fetal, brain, and carcinoma cell lines. Other biological 
sources of DNA encoding an heregulin-like ligand include other mammals and birds. Among the 
prefen'ed mammals are members of the following orders: bovine, ovine, equine, murine, and 

25 rodentia. 

The oligonucleotkje sequences selected as probes should be of sufficient length and 
sufficiently unambiguous that false positives are minimized. The actual nucleotide 
sequence(s) is usually based on conserved or highly homologous nucleotide sequences or 
regbns of HRG-a. The oiigonucleotkJes may be degenerate at one or nx>re positbns. The use 

30 of degenerate oligonucleotides may be of particular importance where a library is screened 
from a species in which preferential codon usage in that species is not known. The 
oligonucleotide must be labeled such that it can be detected upon hybridization to DNA in the 
library being screened. The preferred method of labeling is to use 32p.|abeled ATP with 
polynucleotkle kinase, as is well known In the art. to radiolabel tfie oligonucleotide. However, 

35 other methods may be used to label the oligonucleotide, including, but not limited to, 
bbtinylation or enzyme labeling. 

Of particular interest is HRG nucleic ackJ that encodes the full-length propolypeptide. 
In some preferred embodiments, the nucleic acid sequence includes the native HRG signal 
transmembrane sequence. Nucleic ackl having all the protein coding sequence is obtained by 
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screening selected cDNA or genomic libraries, and, if necessary, using conventional primer 
extension procedures as described in section 7.79 of Sambrook et aL, supra, to detect 
precursors and processing Intermediates of mRNA that may not have been reverse- 
transcrbed into cDNA. 

5 HRG encoding DNA is used to isolate DNA encoding the analogous iigand from other 

animal species via hybridization employing the methods discussed above. The preferred 
animals are noammals, particularly bovine, ovine, equine, feline, canine and rodentia, and wore 
specifically rats, mice and rabbits. 

B. Amino AcM ^wm Vpnarns pf Heregiilin 

10 Amino acid sequence variants of HRG are prepared by introducing appropriate 

nucleotide changes into HRG DNA. or by in vHro synthesis of the desired HRG polypeptide. 
Such variants include, for example, deletions from, or insertions or substitutbns of, residues 
within the amino acid sequence shown for human HRG sequences. Any combination of 
deletion, hsertion, and substitution can be made to amve at the final construct, provided that 

15 the final construct possesses the desired characteristics. The amino acid changes also may 
alter post*translational processes of HRG*a. such as changing the number or position of 
glycosylation sites, altering the membrane anchoring characteristics, altering the intra-cellular 
locatbn of HRG by inserting, deleting, or otherwise affecting the transmembrane sequence of 
native HRG, or nfx>difying its susceptibility to proteolytic cleavage. 

20 In designing amino acid sequence variants of HRG, the location of the mutation site 

and the nature of the mutation will depend on HRG characteristic(s) to be modified. The sites 
for mutation can be modified individually or in series, e.g., by (1) substituting first with 
consen^ative amino acid choices and then with more radical selections depending upon the 
results achieved, (2) deleting the target residue, or (3) inserting residues of other iigands 

25 adjacent to the located site. 

A useful method for identification of HRG residues or regions for mutagenesis is called 
'alanine scanning mutagenesis' as described by Cunningham and Wells {Science, 244: 1081- 
1085, 1989). Here, a residue or group of target residues are identified (e.g., charged residues 
such as arg, asp, his, lys, and glu) and replaced by a neutral or negatively charged amino acid 

30 (most preferably alanine or polyalanine) to affect the interaction of the amino acids with the 
sun^ounding aqueous environment in or outside the cell. Those domains demonstrating 
functional sensitivity to the substitutions then are refined by introducing further or other 
variants at or for the sites of substitution. Thus, while the site for introducing an amino acid 
sequence variation is predetermined, the nature of the mutation per se need not be 

35 predetemiined. For example, to optimize the performance of a mutation at a given site, ala 
scanning or random mutagenesis noay be conducted at the target codon or region and the 
expressed HRG variants are screened for the optimal combination of desired activity. 

There are two principal variables in the constmction of amino acid sequence variants: 
the location of the mutation site and the nature of the mutation. These are variants from 
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HRG sequence, and may represent naturally occurring alleles (which will not require 
nnanipulation of HRG DNA) or predetermined mutant forms made by mutating the DNA. either 
to arrive at an allele or a variant not found in nature. In general, the location and nature of 
the mutation chosen will depend upon HRG characteristic to be modified. Obviously, such 

5 variations that, for example, convert HRG into a known receptor ligand, are not included 
within the scope of this invention, nor are any other HRG variants or polypeptide sequences 
that are not novel and unobvious over the prior art, 

Amino acid sequence deletions generally range from about 1 to 30 residues, more 
preferably about 1 to 10 residues, and typically about 1 to 5 contiguous residues. Deletions 

10 may be introduced into regions of low homology with other EGF family precursors to modify 
the activity of HRG. Deletions from HRG in areas of substantial homology w^^h other EGF 
family sequences will be more likely to modify the bblogical activity of HRG mor' significantly. 
The number of consecutive deletions will be selected so as to preserve the tertiary structure 
of HRG in the affected domain, e.g.. cysteine crosslinktng, beta-pleated sheet or alpha helix. 

15 Amino acid sequence insertions include amino- and/or cart)oxyl-terminal fusions 

ranging in length from one residue to polypeptides containing a hundred or more reskJues. as 
well as intrasequence insertions of single or multiple amino acid residues, intrasequence 
insertions (i.e.. insertions within HRG sequence) may range generally from about 1 to 10 
residues, more preferably 1 to 5, and most preferably 1 to 3. Examples of terminal insertions 

2D include HRG with an N-terminal methk>nyl reskiue (an artifact of the direct expression of HRG 
in bacterial recombinant cell culture), and fusk)n of a heterobgous N-terminal signal sequence 
to the N*tenninus of HRG to facilitate the secretion of mature HRG from recombinant host 
cells. Such signal sequences generally will be obtained from, and thus be honDologous to, the 
intended host cell species. Suitable sequences include STII or Ipp for £ co//. alpha factor for 

25 yeast, and viral signals such as herpes gO for nfiammaiian cells. 

Other insertional variants of HRG include the fusion to the N- or C-terminus of HRG 
to an immunogenic polypeptide, e.g., bacterial polypeptides such as beta-lactamase or an 
enzyme encoded by the £ coli trp k)cus, or yeast protein, bovine serum albumin, and 
chemotactic polypeptides. G-terminal fusk>ns of HRG-NTD-GFD with proteins having a tong 

30 hatf-life such as immunoglobulin constant regbns (or other invnunoglobultn regbns), albumin, or 
ferritin, as described in WO 89/02922, published 6 April 1989 are included. 

Another group of variants are amino acid substitutbn variants. These variants have 
at least one amino acid residue in the HRG molecule removed and a different residue inserted 
in its place. The sites of greatest interest for substitutional mutagenesis include sites 

35 klentified as the active site(s) of HRG, and sites where the amino acids found in HRG ligands 
from various species are substantially different in terms of skle-chain bulk, charge, and/or 
hydrophobrcity. 
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The amino terminus of the cytoplasmic region of HRG may be fused to the carfooxy 
terminus of heterologous transmembrane domains and receptors, to iom a fusion polypeptide 
useful for intracellular signaling of a ligand binding to the heterologous receptor. 

Other sites of interest are those in which particular residues of HRG-like iigands 
5 obtained from various species are identical. These positions may be important for the 
biological activity of HRG. These sites, especially those falling within a sequence of at least 
three other identically conserved sites, are substituted in a relatively conservative manner. 
Such conservative substitutions are shown in Table 1 under the heading of 'preferred 
substitutions*. If such substitutions result in a change in biological activity, then more 
10 substantial changes, denominated exemplary substitutions in Table 1 , or as further described 
below in reference to amino ^id classes, are introduced and the products screened. 

TAPi-E 1 ' 

Original Exemplary Preferred 

• BfiSidUfi Substitutions Substitutions 

15 

Ala (A) val;leu;ile val 

Arg (R) lys; gh; asn lys 

Asn (N) gh; his; lys; arg g|h 

Asp (D) glu 0ii 

2D Cys (C) ser ser 

Gin (Q) asn asn 

Glu (E) asp asp 

Gly (G) pro pro 

His (H) asn; gin; lys; arg arg 

25 lie (I) leu; val; met; ala; phe; 

norleucine leu 

Leu (L) norteudne; ile; val; 

met; ala; phe le 

Lys (K) arg; gin; asn arg 

30 Met(M) leu; phe; ie leu 

Phe (F) leu; val; lie; ala leu 

Pro (P) gly gly 

Ser (S) thr thr 

Thr (T) ser ser 

35 Trp (W) tyr tyr 

Tyr (Y) trp; phe; thn ser phe 

Val (V) ie; leu; met; phe; 

ala; norleucine leu 
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Substantial modifications in function or immunological identity of HRG are 
accomplished by selecting substitutions that differ significantly in their effect on maintaining 
(a) the structure of the polypeptide backbone In the area of the substitution, for example, as 
a sheet or helical conformation, (b) the charge or hydrophobidty of the molecule at the target 
5 site, or (c) the bulk of the side chain. Naturally occurring residues are divWed into groups 
based on common skle chain properties: 

1 ) hydrophobk:: norfeuche, met, aia, val, leu, lie; 

2) neutral hydrophiiic: cys, ser, thr, 

3) acidic: asp, glu; 

10 4 ) basic: asn. gin. his, iys, arg; 

5) resklues that influence chain orientatnn: gly, pro; and 

6) aromatic: trp, tyr, phe. 

Non-conservative substitutions will entail exchanging a member of one of these 
classes for another. Such substituted resklues may be introduced into regions of HRG that 

15 are homologous with other receptor ligands, or, more preferably, into the non-homologous 
regk>nsofthenx>lecule. 

in one embodiment of the inventton. it is desirable to inactivate one or more protease 
cleavage sites that are present in the molecule. These sites are Wentified by inspectran of the 
encoded amino acid sequence. Where potential protease cleavage sites are kJentified, e.g. at 

2D K241 R242, they are rendered inactive to proteolytic cleavage by substituting the targeted 
residue with another reskJue, preferably a baste residue such as glutamine or a hydrophylic 
residue such as serine; by deleting the resklue; or by inserting a prolyl residue immediately 
after the resklue. 

In another embodiment, any methionyl residue other than the starting methionyl 
25 resklue, or any residue kjcated within about three residues N- or C-terminal to each such 
methionyl residue, is substituted by another resklue (preferably in accord with Table 1) or 
deleted. We have found that oxidation of the 2 6FD M residues in the courses of E coli 
expresston appears to severely reduce GFD activity. Thus, these M resklues are mutated in 
accord with Table 1. Alternatively, about 1-3 resklues are inserted adjacent to such sites. 
30 Any cysteine residues not involved in maintaining the proper conformation of HRG 

also may be substituted, generally with serine, to improve the oxidative stability of the 
molecule and prevent aberrant crosslnking. 

Sites partfculariy suited for substituttons, deletions or insertions, or use as fragments, 
include, numbered from the N-tenninus of HRG-o of Figure 4: 
35 1) potential glycosaminoglycan addition sites at the serine-glycine dipeptides at 42-43, 
64-65,151-152; 

2) potential asparagine-linked glycosylation at positions 164, 170, 208 and 437. sites 
(NDS) 164-166, (NIT) 170-172, (NTS) 208-210, and NTS (609-611); 

3) potential 0-glycosylatk)n in a cluster of serine and threonine at 209-218; 
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4 ) cysteines at 226, 234, 240, 254. 256 and 265; 

5) transmembrane domain at 287-309; 

6) loop 1 delineated by cysteines 226 and 240; 

7) loop 2 delineated by cysteines 234 and 254; 

5 8) loop 3 delineated by cysteines 256 and 265; and 

9) potential protease processing sites at 2-3, 8-9, 23-24, 33-34. 36-37. 45-46, 48-49, 62- 
63, 66^, 86-87. 110-111. 123-124, 134-135. 142-143. 272-273, 278-279 and 285-286; 

Analogous regions in HRG-pi may be determined by reference to figure 9 which aligns 
analogous amino acids in HRQ-a and HRG-pi. The analogous HRG-^l amino acids may be 

10 mutated or modified as discussed above for HRG-a. Analogous regions in HRG-p2 may be 
detemiined by reference to figure 15 which aligns analogous amino acids in HRG-a, HRG-3I 
and HRG-P2. The analogous HRG-P2 amino acids may be mutated or modified as discussed 
above for HRG-a or HRG-^l. Analogous regions in HRG-p3 may be determined by 
reference to figure 15 which aligns analogous amino acids in HRG-a, HRG-pl and HRG-^2. 

15 The analogous HRG-P3 amino acids may be mutated or modified as discussed above for 
HRG-a, HRG-pl, or HRG-P2. 

DNA encoding amino acid sequence variants of HRG is prepared by a variety of 
methods known in the art. These methods include, but are not limited to, isolation from a 
natural source (in the case of naturally occun-ing amino acid sequence variants) or 

20 preparation by oligonucleotide-mediated (or site-directed) mutagenesis, PGR mutagenesis, and 
cassette mutagenesis of an earlier prepared variant or a non-variant version of HRG. These 
techniques may utilize HRG nucleic acid (DNA or RNA), or nucleic acid complementary to 
HRG nucleic acid. 

Oligonucleotide-mediated mutagenesis is a prefen-ed method for preparing substitution, 
25 deletion, and insertion variants of HRG DNA. This technique is well known in the art as 
described by Adelman et ai, DNA, 2: 183 (1983). 

Generally, oligonucieotkles of at least 25 nucleotides n length are used. An optimal 
oligonucleotide will have 12 to 15 nucleotides that are completely complementary to the 
template on either side of the nucleotide(s) coding for the mutation, this ensures that the 
30 oligonucleotide will hybridize properly to the single-stranded DNA template molecule. The 
oligonucleotides are readily synthesized using techniques known in the art such as that 
described by Grea et al. {Proc. Natl. Acad. Sci. USA. 75: 5765,1978). 

Single-stranded DNA template may also be generated by denaturing double-stranded 
plasmid (or other) DNA using standard techniques. 
35 For alteration of the native DNA sequence (to generate amino acid sequence 

variants, for example), the oligonucleotide is hybridized to the single-stranded template under 
suitable hybridization conditions. A DNA polymerizing enzyme, usually the Klenow fragment 
of DNA polymerase I, is then added to synthesize the complementary strand of the template 
using the oligonucleotide as a primer for synthesis. A heteroduplex molecule is thus formed 
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such that one strand of ONA encodes the mutated form of HRG, and the other strand (the 
original template) encodes the native, unaltered sequence of HRG. This heteroduplex molecule 
is then transfonned into a suitable host cell, usually a prokaryote such as £. coli JM101 . After 
the cells are grown, they are plated onto agarose plates and screened using the oligonucleotide 
5 primer radiolabeled with 32p.phosphate to identify the bacterial colonies that contain the 
mutated DNA. The mutated region is then removed and placed in an appropriate vector for 
protein production, generally an expression vector of the type typically employed for 
transformation of an appropriate host. 

The method described immediately above may be irwdified such that a homoduplex 

10 molecule is created wherein both strands of the plasmid contain the mutation(s). The 
modifications are as follows: the single-stranded oligonucleotide is annealed to the single- 
stranded template as described above. A mixture of three deoxyribonucleotides, 
deoxyriboadenosine (dATP), deoxyriboguanosine (dGTP), and deoxyribothymidlne (dTTP), is 
confU>ined with a modified thioKleoxyribocytosine called dCTP-(aS) 

15 (Amersham Corporation). This mixture is added to the template-oligonucleotide complex. 
Upon addition of ONA polymerase to this mixture, a strand of DNA identical to the template 
except for the mutated bases is generated. In addition, this new strand of DNA will contain 
dCTP-(aS) instead of dCTP, which serves to protect it from restriction endonuclease 
digestion. After the template strand of the double-stranded heteroduplex is nicked with an 

20 appropriate restriction enzyme, the template strand can be digested with SjssiWl nuclease or 
another appropriate nuclease past the region that contains the site(s) to be mutagenized. The 
reaction is then stopped to leave a molecule that is only partially single-stranded. A complete 
double-stranded DNA homoduplex is then fonned using ONA polymerase in the presence of all 
four deoxyribonucleotide triphosphates, ATP, and DNA ligase. This homoduplex molecule can 

25 then be transfonned into a suitable host cell such as £ coli JM1 01 , as described above. 

Explanary substitutions common to any HRG include S2T or 0; E3D or K; R4 K or E; 
K5R or E; E60 or K; G7P or Y; R8K or 0; G9P or Y; K10R or E; 611P or Y; K12R or E; G19P 
or Y; S20T or F; G21P or Y; K22 or E; K23R or E; Q38D; S107N; G108P; N120K; D121K: S122 
T; N126S; I126L; T127S; A163V; N164K; T165-T174; any resklue to I, L. V, M, F, 0, E. R or 

30 K; G175V or P; T176S or V; S177K orT; H178K or S; L179F or I; V180L or S; K181R or E; A 
183N or V; E184K or 0; K185R or E; E1860 or Y; K187R or 0; T188S or Q; F189Y or S; V191L 
or 0; N192Q or H; G193P or A; G194P or A; E195D or K; F197Y or I; M198V or Y; V199L or T; 
K200V or R; 0201 E or K; L202E or K; S203A or T; N204A; N204Q: P205A; P205G; S206T or 
R; R207K or A; Y208P or F; L209I or D; K211I or D; F216Y or I; T217 H or S: G218A or P; 

35 A/D219K or R; R220K or A; A235/240/232V or F; E236/241/233D or K; E237/242/234D or 
K; L238/243/235I or T; Y239/244/236F or T; Q240/245/237N or K; K241/246/238H or R; 
R242/247/238H or K; V243/248/239L or T; 1244/249/2401 or S; T245/250/241S or I; 
I246/251/242V or T and T247/252/243S or I. Specifcally with respect to HRG-a. T222S. K 
or V; E223D. R or Q; N224Q. K or F; V225A. R or 0; P226G. I K or F; M227V. T. R or Y; 
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K228R. H or D; V229L. K or D; Q230N, R or Y; N231Q, K or Y; Q232N. R or Y; E233D, K or 
T and K 234R, H or D (adjacent K/R mutations are paired in alternative embodiments to 
create new proteolysis sites). Specifically with respect to HRG-p (any member). Q222N. R 
or Y; N223Q, K or Y; Y224F. T or R; V225A, K or D; M226V, T or R; A227V. K, Y or D; 
5 S228T. Y or R; F229Y, I or K and Y230F, T or R are suitable variants. Specifically with 
respect to HRG-pl, K231R or D. H232R or D; L233I. K, F or Y; G234P. R. A or S; 12351. K, F 
or Y; E236D. R or A; F237I. Y, K or A; M238V, T. R or A and E239D. R or A are suitable 
variants. Specifically with respect to HRG-pi and HRG-P2. K231R or D are suitable 
variants. Alternatively, each of these residues may be deleted or the Indicated substituents 

10 tfiserted adjacent thereto. In addition, about from MO variants are combined to produce 
combinations. These changes are made in the proHRG, NTD, GFD. NTD-GFD or other 
fragments or fusions. Q213-G215. A219 and the about 11-21 residues C-temninal to C221 
differ among the various HRG classes. Residues at these are interchanged among HRG 
classes or EGF family members, are deleted, or a residue inserted adjacent thereto. 

15 DNA encoding HRG-a mutants with more than one amino acid to be substituted may 

be generated in one of several ways. K the amino acids are located close together in the 
polypeptide chain, they may be mutated simultaneously using one oligonucleotide that codes 
for all of the desired amino acid substitutions. K. however, the amino acids are located some 
distance from each other (separated by more than about ten amino acids), it is more difficutt 

20 to generate a single oligonucleotide that encodes all of the desired changes. Instead, one of 
two alternative methods may be employed. 

PGR mutagenesis is also suitable for making amino acid variants of HRG-a. While 
the following discussion refers to DNA. it is understood that the technique also finds 
application with RNA. The PGR technique generally refers to the following procedure (see 

25 Eriich. supra, the chapter by R. Higuchi. p. 6V70). When small amounts of template DNA are 
used as starting material in a PGR. primers that differ slightly in sequence from the 
corresponding region in a template DNA can be used to generate relatively large quantities of 
a specific DNA fragment that differs from the template sequence only at the positions where 
the primers differ from the template. For introduction of a mutation into a plasmid DNA. one 

30 of the primers is designed to overlap the position of the mutation and to contain the mutation; 
the sequence of the other primer must be identical to a stretch of sequence of the opposite 
strand of the plasmid. but this sequence can be located anywhere along the plasmid DNA. It is 
prefen-ed. however, that the sequence of the second primer is located within 200 nucleotides 
from that of the first, such that in the end the entire amplified region of DNA bounded by the 

35 primers can be easily sequenced. PGR amplification using a primer pair like the one just 
described results in a population of DNA fragments that differ at the position of the mutation 
specified by the primer, and possibly at other positions, as template copying is somewhat 
error-prone. 
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If the ratio of template to product material is extremely low, the vast majority of 
product DNA fragments incorporate the desired mutation(s). This product nraterial is used to 
replace the corresponding region in the plasmid that served as PGR template using standard 
DNA technology. Mutations at separate positions can be introduced simultaneously by either 
5 using a mutant second primer, or performing a second PGR with different mutant primers and 
figating the two resulting PGR fragments simultaneously to the vector fragment in a three (or 
more)-part ligation. 

Another method for preparing variants, cassette mutagenesis, is based on the 
technique described by Wells et al. {Gene, 34: 315,1985). The starting material is the plasmid 

10 (or other vector) comprising HRG DNA to be mutated. The codon(s) in HRG DNA to be 
mutated are identified. There must be a unique restriction endonuclease site on each side of 
the identified mutation site(s). If no such restriction sites exist, they may be generated using 
the above-described oligonucleotide-mediated mutagenesis method to introduce them at 
appropriate locations in HRG DNA. After the restriction sites have been ntroduced into the 

15 plasmid, the plasmid is cut at these sites to linearize it. A double-stranded oligonucleotide 
encoding the sequence of the DNA between the restriction sites but containing the desired 
mutation(s) is synthesized using standard procedures. The two strands are synthesized 
separately and then hybridized together using standard techniques. This double-stranded 
oligonucleotide is referred to as the cassette. This cassette is designed to have 3' and 5' ends 

2D that are compatible with the ends of the linearized plasmid, such that it can be directly ligated 
to the plasmid. This plasmid now contains the mutated HRG DNA sequence. 
C. Insertion of DNA into a CloninQ or BcoTBSsion Vehicle 
The cDNA or genomic DNA encoding native or variant HRG is inserted into a 
replicable vector for further cloning (amplification of the DNA) or for expression. Many 

25 vectors are available, and selection of the appropriate vector will depend on 1) whether it is to 
be used for DNA amplification or for DNA expression, 2) the size of the DNA to be inserted 
into the vector, and 3) the host ceil to be transfonned with the vector. Each vector contains 
various components depending on its function (amplification of DNA or expression of DNA) 
and the host cell for which it is compatible. The vector components generally include, but are 

30 not limited to, one or more of the following: a signal sequence, an origin of replication, one or 
more maricer genes, an enhancer element, a promoter, and a transcription termination 
sequence. 

(I) Signal Sequence Component 
in general, the signal sequence ntay be a component of the vector, or it may be a part 
35 of HRG DNA that is inserted into the vector. The native HRG DNA is believed to encode a 
signal sequence at the amino terminus (5' end of the DNA encoding HRG) of the polypeptide 
that is cleaved during post-transiational processing of the polypeptide to forni the mature 
HRG polypeptide ligand that binds to p185M^R2 receptor, although a conventional signal 
structure is not apparent. Native proHRG is. secreted from the cell but may remain lodged in 
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the membrane because it contains a transmembrane domain and a cytoplasmic region in the 
carboxyl terminal region of the polypeptide. Thus, in a secreted, soluble version of HRG the 
carboxyl termffial domain of the molecule, including the transmembrane domain, is ordinarily 
deleted. This truncated variant HRG polypeptide may be secreted from the cell, provided that 
5 the DNA encoding the truncated variant encodes a signal sequence recognized by the host. 

HRG of this invention may be expressed not only directly, but also as a fusion with a 
heterologous polypeptide, preferably a signal sequence or other polypeptide having a specific 
cleavage site at the N-and/or C-terminis of the mature protein or polypeptide. In general, the 
signal sequence may be a component of the vector, or it may be a part of HRG DNA that is 

10 inserted into the vector. Included within the scope of this invention are HRG with the native 
signal sequence deleted and replaced with a heterologous signal sequence. The heterologous 
signal sequence selected should be one that is recognized and processed. i.e., cleaved by a 
signal peptidase, by the host cell. For prokaryotic host cells that do not recognize and 
process the native HRG signal sequence, the signal sequence is substituted by a prokaryotic 

15 signal sequence selected, for example, from the group of the alkaline phosphatase, 
penicillinase. Ipp, or heat-stable enterotoxin II leaders. For yeast secretion the native HRG 
signal sequence may be substituted by the yeast invertase, alpha factor, or acid phosphatase 
leaders. In manvnalian cell expressbn the native signal sequence is satisfactory, although 
other manrmlian signal sequences may be suitable. 

20 (ii) Origin Repiiwtign Component 

Both expression and cloning vectors generally contain a nucleic acid sequence that 
enables the vector to replicate in one or more selected host cells. Generally, in cloning vectors 
this sequence is one that enables the vector to replicate independently of the host 
chromosomal DNA. and includes origins of replk3tk>n or autonomously replicating sequences. 

25 Such sequences are well known for a variety of bacteria, yeast, and viruses. The origin of 
replicatbn from the ptasmid pBR322 is suitable for ms\ Gram-negative bacteria, the 2^ 
piasmkl origin is suitable for yeast, and vark)us viral origins (SV40. polyoma, adenovirus. 
VSV or BPV) are useful for cloning vectors in mammalian cells. Generally, the origin of 
replication component is not needed for manvnalian expression vectors (the SV40 origin may 

30 typk^lly be used only because it contains the eariy promoter). 

Most expression vectors are 'shuttle' vectors, i.e., they are capable of replication in 
at least one class of organisms but can be transfected into another organism for expression. 
For example, a vector is ctoned in £ co// and then the same vector is transfected into yeast 
or mammalian cells for expressbn even though it is not capable of replicating independently of 

35 the host cell chromosome. 

DNA may also be amplified by insertion into the host genome. This is readily 
accomplished using Bacillus species as hosts, for example, by including in the vector a DNA 
sequence that is complementary to a sequence found in Bacillus genomic DNA. Transfectk)n 
of Bacillus with this vector results in homotogous recombination with the genome and insertion 
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of HRG DNA. However, the recovery of genomic DNA encoding HRG is more complex than 
that of an exogenously replicated vector because restriction enzyme digestion is required to 
excise HRG DNA. DNA can be amplified by PCR and directly transfected into the host cells 
without any replication component. 

5 (Di) SgtegfjpnGfflgggmppngnt 

Expression and cloning vectors should contain a selection gene, also temned a 
selectable marker. This gene encodes a protein necessary for the survival or growth of 
transformed host cells grown in a selective culture medium. Host cells not transfonned with 
the vector containing the selection gene will not survive in the culture medium. Typical 

10 selection genes encode proteins that (a) confer resistance to antbiotics or other toxins, e.g.. 
ampicillin. neomycin, methotrexate, or tetracycline, (b) complement auxotrophic deficiencies, 
or (c) supply critical nutrients not available from complex media, e.g., the gene encoding D- 
alanine racemase for Bacilli 

One example of a selection scheme utilizes a drug to arrest growth of a host cell. 

15 Those cells that are successfully transformed with a heterologous gene express a protein 
conferring drug resistance and thus survive the selection regimen. Exaniples of such dominant 
selection use the drugs neomycin (Southem ef a/.. J. Molec. AppL Genet 1: 327.1982), 
mycophenolic acid (f^ulligan e/a/., Sci'ance 209: 1422,1980) or hygromycin (Sugden ef a/., Moi 
Cell. Biol 5: 410-413.1985). The three examples given above employ bacterial genes under 

20 eukaryotic control to convey resistance to the appropriate drug G418 or neomycin (geneticin), 
xgpt (mycophenolic acid), or hygromycin, respectively. 

Another example of suitable selectable o^aricers for mammalian cells are those that 
enable the identification of cells competent to take up HRG nucleic acid, such as dihydrofoiate 
reductase (DHFR) or thymidine kinase. The mammalian cell transformants are placed under 

25 selection pressure which only the transformants are uniquely adapted to survive by virtue of 
having taken up the martcer. Selectk)n pressure is imposed by culturing the transformants 
under conditbns in which the concentration of selectk>n agent in the medium is successively 
changed, thereby leading to amplificatk>n of both the selection gene and the DNA that encodes 
HRG. Amplification is the process by which genes in greater demand for the production of a 

30 protein critical for growth are reiterated in tandem within the chromosomes of successive 
generations of recombinant cells. Increased quantities of HRG are synthesized from the 
amplified DNA. 

For example, cells transformed with the DHFR selection gene are first identified by 
culturing all of the transformants in a culture medium that contains methotrexate (Mtx), a 
35 competitive antagonist of DHFR. An appropriate host cell when wild-type DHFR is employed 
is the Chinese hamster ovary (CHO) cell line deficient in DHFR activity, prepared and 
propagated as described by Uriaub and Chasin. P/oc. Natl Acad. ScL USA, 77: 4216. 1980. 
The transformed cells are then exposed to Increased levels of methotrexate. This leads to 
the synthesis of multiple copies of the DHFR gene, and, concomitantly, multiple copies of other 
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DNA comprising the expression vectors, such as the DNA encoding HRQ. This amplification 
technique can be used with any otherwise suitable host, e.g., ATCC No. CCL61 CH0-K1. 
rK)twithstanding the presence of endogenous DHFR if, for example, a mutant DHFR gene that 
is highly resistant to Mtx is employed (EP 117,060). Attematively, host cells (particularly 

5 wild-type hosts that contain endogenous DHFR) transfomned or co-transformed with DNA 
sequences encoding HRG. wild-type DHFR protein, and another selectable marlcer such as 
aminoglycoside 3' phosphotransferase (APH) can be selected by cell growth in medium 
containnig a selection agent for the selectable marker such as an aminogtycosidic antibiotic, 
e.g., kanannycin, neomycin, or 6418 (see U.S. Pat. No. 4,965.199). 

10 A suitable selection gene for use in yeast is the frpl gene present in the yeast plasmid 

YRp7 (Stinchcomb etai, Nature,2BZ: 39, 1979; Kingsman etai. Gene J: 141. 1979; or 
Tschemper et al. Gene, 10: 157, 1980). The t^p^ gene provides a selection marker for a 
mutant strain of yeast lacking the ability to grow in tryptophan, for example. ATCC No. 
44076 or PEP4-1 (Jones. Genetics, 85: 12, 1977). The presence of the trp^ lesion in the yeast 

15 host cell genome then provides an effective environment for detecting transformation by 
growth in the absence of tryptophan. Similarly. /.eu2-deficient yeast strains (ATCC 20,622 
or 38.626) are complemented by known plasmids bearing the LeUZ gene. 

(h^) PremrtyCQfTOaosni 

Expression and ctoning vectors usually contain a promoter that is recognized by the 

20 host organism and is operably linked to HRG nucleic acid. Promoters are untranslated 
sequences located upstream (5') to the start codon of a structural gene (generally within 
about 100 to 1000 bp) that control the transcription and translation of a particular nucleic acid 
sequence, such as HRG to which they are operably linked. Such promoters typically fall into 
two classes, inducible and constitutive. Inducible promoters are promoters that initiate 

25 increased levels of transcription from DfsIA under their control in response to some change in 
culture conditions, e.g.. the presence or absence of a nutrient or a change in temperature. At 
this time a large number of promoters recognized by a variety of potential host cells are well 
known. These promoters are operably linked to DNA encoding HRG by removing the promoter 
from the source DNA by restriction enzyme digestion and inserting the isolated promoter 

30 sequence into the vector. Both the native HRG promoter sequence and mny heterotogous 
promoters may be used to direct amplification and/or expression of HRG DNA. However, 
heterologous promoters are prefen^ed, as they generally pemiit greater transcription and 
higher yiekls of expressed HRG as compared to the native HRG promoter. 

Promoters suitable for use with prokaryotic hosts include the ^-lactamase and 

35 lactose promoter systems (Chang et a/., Nature, 275: 615. 1976; and Goeddel et a/.. Nature 
281: 544. 1979), alkaline phosphatase, a tryptophan (trp) promoter system (Goeddel. Nucleic 
Acids Res,, 8: 4057. 1980 and EP 36.776) and hybrid promoters such as the tac promoter 
(deBoer et al, Proc Natl. Acad. ScL USA 80: 21-25, 1983). However, other known bacterial 
promoters are suitable. Their nucleotide sequences have been published, thereby enabling a 



wo 92/20798 ^ W PCr/US92/04295 



Skilled worker operabty to ligate them to DNA encoding HRG (Siebenlist etaL Cell 20: 269, 
1980) using linkers or adaptors to supply any required restriction sites. Promoters for use in 
bacterial systems also generally will contain a Shine-Oalgamo (S.D.) sequence operably linked 
to the DNA encoding HRG. 

5 Suitable promoting sequences for use with yeast hosts include the promoters for 3- 

phosphoglycerate kinase (Hitzeman et a/., 1 Biol Chem., 255: 2073, 1980) or other glycolytic 
enzymes (Hess etai, J. Adv. Enzyme Reg 7: 149. 1968; and Holland. Biochemistry ^7: 4900, 
1978), such as enolase. glyceraldehyde-3-phosphate dehydrogenase, hexokinase. pynivate 
decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, S-phosphoglycerate 

10 mutase. pyruvate kinase, triosephosphate isomerase, phosphoglucose isomerase, and 
glucokinase. 

Other yeast promoters, which are mducible promoters having the additional 
advantage of transcription controlled by growth conditions, are the promoter regions for 
alcohol dehydrogenase 2, isocytochrome C. acid phosphatase, degradative enzymes 

15 associated with nitrogen metabolism, metallothionein. glyceraldehyde-3*phosphate 
dehydrogenase, and enzymes responsible for noattose and galactose utilization. Suitable 
vectors and pronfX)ters for use in yeast expressbn are further described in Hitzeman et al, 
EP 73.657A. Yeast enhancers also are advantageously used with yeast promoters. 

Promoter sequences are known for eukaryotes. Virtually all eukaryotic genes have 

2D an AT-rich region k>cated approximately 25 to 30 bases upstream from the site where 
transcriptbn is initiated. Another sequence found 70 to 80 bases upstream from the start of 
transcription of many genes is a CXCAAT (SEQ ID N0:1) region where X may be any 
nucleotkle. At the 3' end of most eukaryotk; genes is an AATAAA sequence (SEQ ID N0:2) 
that may be the signal for addition of the poly A tail to the 3' end of the coding sequence. All 

25 of these sequences are suitably inserted into mammalian expression vectors. 

HRG gene transcription from vectors in mammalian host cells is controlled by 
promoters obtained from the genomes of vimses such as polyoma virus, fowlpox virus (UK 
2,211.504. published 5 July 1989). adenovirus (such as Adenovirus 2). bovine papillonna virus, 
avian sarcoma vims, cytomegatovirus, a retrovims. hepatrtis*B virus and most preferably 

30 Simian Virus 40 (SV40), from heterotogous mammalian promoters, e.g.. the actin promoter or 
an immunoglobulin promoter, from heat-shock promoters, and from the promoter nomnally 
associated with HRG sequence, provkled such promoters are compatible with the host cell 
systems. 

The early and late promoters of the SV40 virus are conveniently obtained as an 
35 SV40 restriction fragment that also contains the SV40 viral origin of replication (Fiers et a/.. 
Nature, 273:113 (1978); Mulligan and Berg, Sc/ence, 209: 1422-1427 (1980); Pavlakis ef a/., 
Proc, Natl Acad, Sci USA, 78: 7398-7402 (1981)). The immediate eariy promoter of the 
human cytomegalovirus is conveniently obtained as a Hiadlll E restriction fragment 
(Greenaway et aL Ger^e. 18: 355*360 (1982)). A system for expressing DNA in mammalian 
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hosts using the bovine papilloma vinis as a vector is disclosed in U.S. Pat. No. 4.419,446. A 
modification of this system is described in U.S. Pat. No. 4,601,978. See also Gray et al, 
Nature, 295: 503-508 (1982) on expressing cDNA encoding immune interferon in monkey cells; 
Reyes et al, Nature, 297: 598-601 (1982) on expression of human ^-interferon cDNA in mouse 

5 cells under the control of a thymidine kinase promoter from herpes simplex vinjs; Canaani and 
Berg, Proc. NatL Acad. Sci USA, 79: 5166-5170 (1982) on expression of the human interferon 
pi gene in cultured mouse and rabbit cells; and Gorman et ai, Proc. NatL Acad ScL USA 79: 
6777-6781 (1982) on expresston of bacterial CAT sequences in CV-1 monkey kkjney cells, 
chicken embryo fibroblasts, Chinese hamster ovary cells. HeLa cells, and mouse NIH-3T3 

10 cells using the Rous sarcoma virus tong terminal repeat as a pronnoter. 

(V) EnhgPwrElprorrt Component 

Transcription of a DNA encoding HRG of this ffivention by higher eukaryotes is often 
increased by inserting an enhancer sequence into the vector. Enhancers are cis-acting 
elements of DNA, usually about from 10-300 bp, that act on a promoter to increase its 
15 transcription. Enhancers are relatively orientation and position independent having been found 
5' (Laimins et al, Proc. Natl. Acad ScL USA, 78: 993. 1981) and 3* (Lusky et ai, MoL Cell 
Bio., 3: 1108, 1983) to the transcription unit, within an intron (Banerji et a!., Cell, 33: 729, 

1983) as well as within the coding sequence itself (Osborne et aL, MoL Cell Bio., 4: 1293, 

1984) . Many enhancer sequences are now known from mammalian genes (globin, elastase, 
20 albumin, a-fetoprotein and insulin). Typically, however, one will use an enhancer from a 

eukaryotic cell vims. Examples delude the SV40 enhancer on the late skJe of the replication 
origin (bp 100-270). the cytomegatovims eariy promoter enhancer, the polyoma enhancer on 
the late skJe of the replication origin, and adenovirus enhancers (see also Yaniv, Nature, 297: 
17-18 (1982)) on enhancing elements for activation of eukaryotic promoters. The enhancer 
25 may be spliced into the vector at a position 5' or 3* to HRG DNA. but is preferably tocated at 
a site 5' from the pronrK)ter. 

(vi) Trgnsgriptipn Termination Compongnt 

Expression vectors used in eukaryotic host cells (yeast, fungi, insect, plant, animal. 

human, or nucleated cells from other multicellular organisms) will also contain sequences 
30 necessary for the temnination of transcription and for stabilizing the mRNA. Such sequences 

are convnonty available from the 5' and. occasionally 3' untranslated regions of eukaryotic or 

viral DNAs or cDNAs. These regions contain nucleotide segments transcribed as 

polyadenylated fragments in the untranslated portion of the mRNA encoding HRG. The 3" 

untranslated regions also include transcriptbn termination sites. 
35 Construction of suitable vectors containing one or more of the above listed 

components the desired coding and control sequences employs standard ligation techniques. 

isolated plasmids or DNA fragments are cleaved, taitored, and religated in the form desired to 

generate the plasmkls required. 
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For analysis to confirm con-ect sequences in plasmids constructed, the ligation 
mixtures are used to transfomi £ coli K12 strain 294 (ATCC 31,446) and successful 
transformants selected by ampiclllln or tetracycline resistance wtiere appropriate. Plasmids 
from the transformants are prepared, analyzed by restriction endonuclease digestion, and/or 

5 sequenced by the method of Messing ef a/., A/uc/e*cAadsfles. 9: 309 (1981) or by the method 
of Maxam et al.. Methods in Enzymology 65: 499 (1 980). 

Particularly useful in the practice of this invention are expression vectors that 
provide for the transient expression in mammalian cells of DNA encoding HRG. In general, 
transient expression involves the use of an expression vector that is able to replicate 

10 efficiently in a host cell, such that the host cell accumulates many copies of the expression 
vector and, in turn, synthesizes high levels of a desired polypeptide encoded by the expression 
vector. Transient expression systems, comprising a suitable expression vector and a host 
cell, allow for the convenient positive identification of polypeptides encoded by cloned DMAs, 
as well as for the rapid screening of such polypeptides for desired biological or physiological 

15 properties. Thus, transient expression systems are particularly useful in the invention for 
purposes of identifying analogs and variants of HRG that have HRG-like activity. Such a 
transient expression system is described in EP 309,237 published 29 March 1989. Other 
methods, vectors, and host cells suitable for adaptation to the synthesis of HRG in 
recombinant vertebrate cell culture are described in Gething et a/., Nature 293: 620-625, 1981 ; 

2D Mantel et al, Nature, 281 : 40-46, 1979; Levinson et al., EP 117,060 and EP 117.058. A 
particulariy useful expression plasmid for mammalian cell culture expression of HRG is pRK5 
(EPpub.no. 307^47). 

D. Selection and Transformation of Host Cells 

Suitable host cells for cloning or expressing the vectors herein are the prokaryote, 
25 yeast, or higher eukaryote cells described above. Suitable prokaryotes include eubaderia, 
such as GranHiegative or Gram-positive organisms, for example, £ coli. Bacilli such as B. 
subtilis, Pseudomonas species such as P. aeruginosa. Salmonella typhimurium, or Serratia 
marcescans. One preferred £ co/Zctoning host is £ co// 294 (ATCC 31,446), although other 
strains such as £ co//B. £ coli ,1776 (ATCC 31,537), and £ coli mm (ATCC 27,325) are 
30 suitable. These examples are illustrative rather than limiting. Preferably the host cell shoukl 
secrete minimal amounts of proteolytic enzymes. Altematively. in vitro methods of cbning, 
e.g., PCR or other nucleic ack) polymerase reactions, are suitable. 

In additnn to prokaryotes, eukaryote mk:robes such as filamentous fungi or yeast are 
suitable hosts for HRG-encoding vectors. Saccharomyces cerevisiae, or common baker's 
35 yeast, is the most commonly used anwng tower eukaryotte host microorganisms. However, a 
number of other genera, species, and strains are commonly available and useful herein, such 
as Schizosaccharomyces pombe (Beach and Nurse. Nature, 290: 140 (1981); EP 139.383, 
published May 2, 1985), Kluyveromyces hosts (U.S.S.N. 4,943.529) such as, e.g.. K. lactis 
(Louvencourt etal., J. Bacteriol.Jil (1983); K. iragilis. K. bulgaricus. K. thermotolerans, and 
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K. marxianus, yarrowia (EP 402,226); Pichia pastoris (EP 183,070), Sreekrishna et a/., J. 
Basic Microbiol., 28: 265-278 (1988); Candida, Trichoderma reesia (EP 244,234); Neurospora 
crassa (Case et al., Proc. Natl. Acad Sci. USA, 76: 5259-5263 (1979), and filamentous fungi 
such as. e.g. Neurospora, Penicillium, Tolypocladium (WO 91/00357. published 10 January 
5 1991), and Aspergillus hosts such as A. t\idulan$ (Ballance et al., Biochem. Biophys. Res. 
Cortmun., 112: 284-289 (1983); Tilbum etal, Gene, 26: 205-221 (1983); Yelton etai, Proc. 
Natl. Acad Sci. USA. 81: 1470-1474 (1984) and A. mper (Kelly and Hynes, EMBO J., 4: 475- 
479(1985)). 

Suitable host cells for the expression of glycosylated HRG polypeptide are derived 

10 from multicellular organisms. Such host cells are capable of complex processing and 
glycosylation activities. In principle, any higher eukaryotic cell culture is wort(able, whether 
from vertebrate or invertebrate culture. Examples of invertebrate cells include plant and 
insect cells. Numerous baculoviral strains and variants and corresponding pemiissive insect 
host cells from hosts such as Spodoptera frugiperda (caterpillar), Aedes aegypti (mosquito), 

15 Aedes albopictus (mosquito), Drosophila melanogaster (fnjitfly), and Bombyx mori host cells 
have been identified (see, e.g., Luckow etal, Bb/Technology, 6: 47-55 (1988); Miller etal., in 
Genetic Engineering, Settow, J.K. etal, eds., Vol. 8 (Plenum Publishing, 1986), pp. 277-279; and 
Maeda et al.. Nature, 315: 592-594 (1985)). A variety of such viral strains are publicly 
available, e.g., the L-1 variant of Autographa calffomica NPV and the Bm-5 strain of Bombyx 

20 mori NPV, and such viruses may be used as the virus herein according to the present 
Invention, particularly for transfection of Spodoptera fnjgiperda cells. Plant ceil cultures of 
cotton, com, potato, soybean, petunia, tomato, and tobacco can be utilized as hosts. 
Typically, plant cells are transfected by incubation with certain strains of the bacterium 
Agn>bacterium tumefaciens, which has been prevtously manipulated to contain HRG DNA. 

25 During incubation of the plant cell culture with A tumefaciens, the DNA encoding HRG is 
transferred to the plant cell host such that it is transfected, and will, under appropriate 
conditions, express HRG DNA. In addition, regulatory and signal sequences compatible with 
plant cells are available, such as the nopaiine synthase promoter and polyadenylation signal 
sequences (Depicker etal., J. Mol. Appl. Gen., 1:561 [1982]). In addition, DNA segments 

30 isolated from the upstream region of the T-DNA 7B0 gene are capable of activating or 
Increasing transcriptbn levels of plant-expressible genes in recombinant DNA-containing plant 
tissue (see EP 321,196, pubUshed 21 June 1989). 

However, interest has been greatest in vertebrate cells, and propagation of 
vertebrate cells in culture (tissue culture) has become a routine procedure in recent years 

35 (Tissue Culture, Academic Press, Kruse and Patterson, editors (1973)). Examples of useful 
mammalian host cell lines are monkey kidney CV1 line transformed by SV40 (COS-7, ATCC 
CRL 1651); human embryonic kkJney line (293 or 293 cells subctoned for growth in suspension 
culture. Graham et al., J. Gen Viml., 36: 59, 1977); baby hamster kidney cells (BHK, ATCC 
CCL 10); Chinese hamster ovary cells/-DHFR (CHO. Uriaub and Chasin, Proc. Natl. Acad. 
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ScL USA, 77:4216 (19801); fnouse sertoy cells (TM4. Mather. Biol. Reprod, 23:243-251 [1980]); 

monkey kidney cells (CV1 ATCC CCL 70); African green nfwnkey kklney cells {VERO-76, 

ATCC CRL-1587); human cen^ical carcinoma cells (HELA, ATCC CCL 2); canine kidney 

cells (MDCK, ATCC CCL 34); buffalo rat liver cells (BRL 3A, ATCC CRL 1442); human 
5 lung cells (W138, ATCC CCL 75); human Kver cells (Hep G2. HB 8065); mouse mammary 

tumor (MMT 060562, ATCC CCL51); TRI cells (Mather et aL, Annals NX Acad, Sc/.. 

383:44-68 [1982]); MRC 5 cells; FS4 cells; and a human hepatoma cell line (Hep G2). 

Preferred host cells are human embryonk: kkiney 293 and Chinese hamster ovary cells. 

Host cells are transfected and preferably transformed with the above-described 
10 expressbn or cloning vectors of this inventbn and cultured in conventional nutrient media 

modified as appropriate for inducing promoters, selecting transformants. or amplifying the 

genes encoding the desired sequences. 

Transfection refers to the taking up of an expressbn vector by a host cell whether or 

not any coding sequences are m fact expressed. Numerous methods of transfection are 
15 known to the ordinarily skilled artisan, for example, CaP04 and electroporation. Successful 

transfection is generally recognized when any indicatbn of the operatbn of this vector occurs 
within the host cell. 

Transformation means introducing DNA into an organism so that the DNA is 
replicable, either as an extrachromosomal element or by chromosomal integration. Depending 

2D on the host cell used, transformation is done using standard techniques appropriate to such 
cells. The cateium treatment employing calcium chtoride, as described in section 1.82 of 
Sambrook et aL, supra, is generally used for prokaryotes or other cells that contain 
substantial cell-wall barriers. Infection with Agrobacterium tumefaciens is used for 
transformation of certain plant cells, as described by Shaw et aL Gene . 21: 315 (1983) and 

25 WO 89/05859. published 29 June 1989. For mammalian cells without such cell walls, the 
cak:ium phosphate precipitation method described in sections 16.30-16.37 of Sambrook et al, 
supra, is prefen'ed. General aspects of mammalian cell host system transformations have 
been described by Axel in U.S. Pat. No. 4,399,216, issued 16 August 1983. Transfonnations 
into yeast are typically carried out according to the method of Van Solingen et al, 1 Bad. 

30 130:946 (1977) and Hsiao etal, Proc, A/a//. Acad Sci (USA), 76: 3829 (1979). However, 
other methods for introducing DNA into cells such as by nuclear injection, electroporation, or 
protoplast fusbn may also be used. 

E Cw!turinqthgHpg< Cells 

Prokaryotic cells used to produce HRG polypeptide of this invention are cultured in 
35 suitable media as described generally in Sambrook et al, supra. 

The mammalian host cells used to produce HRG of this invention may be cultured in a 
variety of media. Commercially available media such as Ham's F10 (Sigma). Minimal 
Essential Medium ([MEM], Sigma), RPMI-1640 (Sigma), and Dulbecco's Modified Eagle's 
Medium ([DMEM], Sigma) are suitable for culturing the host cells. In addition, any of the media 
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described in Ham and Wallace, Afetf?. Enz., 58: 44 (1979), Barnes and Sato, Anal, Biochem,, 
102:255 (1980). U.S. Pat. Nos. 4.767.704; 4,657.866; 4,927,762; or 4,560.655; WO 90/03430; 
WO 87/00195 and U.S. Pat. Re. 30,985. may be used as culture media for the host cells. Any 
of these media may be supplemented as necessary with hormones and/or other growth 
5 factors (such as insulin, transfenin, or epidermal growth factor), salts (such as sodium 
chloride, calcium, magnesium, and phosphate), buffers (such as HEPES). nucleosides (such as 
adenosine and thymidine). ant2)iotics (such as Gentamycin*''^ drug), trace elements (defined 
as inorganic compounds usually present at final concentrations in the micromolar range), and 
glucose or an equivalent energy source. Any other necessary supplenf)ents may also be 

10 included at appropriate concentrations that would be known to those skilled in the art. The 
culture conditk)ns, such as temperature, pH, and the like, are those prevbusty used with the 
host cell selected for expression, and will t>e apparent to the ordinarily skilled artisan. 

The host cells refen^ed to in this disclosure encompass cells in in vitro culture as well 
as cells that are within a host animal. 

15 it is further envisioned that HRG of this invention may be produced by homologous 

recombinatk>n, or with recombinant production methods utilizing control elements introduced 
into cells already containing ONA encoding HRG cun^ently in use in the field. For example, a 
powerful pronrwter/enhancer element, a suppressor, or an exogenous transcriptton modulatory 
element is inserted in the genome of the intended host cell in proximity and orientation 

2D sufficient to influence the transcriptk)n of DNA encoding the desired HRG. The control 
element does not encode HRG of this invention, but the DNA is present in the host cell genome. 
One next screens for cells making HRG of this invention, or increased or decreased levels of 
expressk>n, as desired. 

F. Detecting Gene AmDltfteatton/Emression 

25 Gene amplification and/or expression may be measured in a sample directly, for 

example, by conventional Southern blotting, Northem blotting to quantitate the transcriptwn 
of mRNA (Thomas. Proc, NatL Acad, Scl i/SA. 77:5201-5205 [1980]). dot blotting (DNA 
analysis), or in situ hybridization, using an appropriately labeled probe based on the sequences 
provkjed herein. Various labels may be employed, mo$\ commonly radioisotopes. partk:ularjy 

30 32 P. However, other techniques may also be employed, such as using bk)tin-modified 
nucleotkles for introduction into a polynucleotkle. The biotin then serves as the site for binding 
to avidin or antibodies which may be labeled with a wide variety of labels, such as 
radk>nuclides, fluorescers. enzymes, or the like. Attematively. antibodies may be employed 
that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA 

35 hybrid duplexes or DNA-protein duplexes. The antibodies in turn may be labeled and the 
assay may be earned out where the duplex is bound to a surface, so that upon the formation 
of duplex on the surface, the presence of antibody bound to the duplex can be detected. 

Gene expression, alternatively, may be measured by immunological methods, such as 
immunohistochemk:al staining of tissue sections and assay of cell culture or body fluids, to 



wo 92/20798 



PCr/US92/04295 



37 

quantrtate directly the expression of gene product. With immunohistochennical staining 
techniques, a cell sample is prepared, typically by dehydration and fixation, followed by 
reaction with labeled antibodies specific for the gene product coupled where the labels are 
usually visually detectable such as enzymatic labels, fluorescent labels, luminescent labels. 

5 and the like. A particularly sensitive staining technique suitable for use in the present 
Invention is described by Hsu etai, Am. 1 Clin. Path., 75: 734-738 (1980). 

Antibodies useful for inimunohistochemical staining and/or assay of sample fluids may 
be either monoclonal or polyclonal, and may be prepared in any manvnal. Conveniently, the 
antibodies may be prepared against a native HRG polypeptide or against a synthetic peptide 

10 based on the DNA sequences provided herein as described further in Section 4 below. 
G. Pwr}fjg9tiQncrfTr)eHffggglin PplYPffitW? 

HRG is recovered from a cellular membrane fraction. Altematively. a proteotyticalLy 
cleaved or a truncated expressed soluble HRG fragment or subdomain are recovered from the 
culture medium as soluble polypeptides. 

15 When HRG is expressed in a recombinant cell other than one of human origin, HRG is 

completely free of proteins or polypeptides of human origin. However, It is desirable to purify 
HRG from recombinant cell proteins or polypeptides to obtain preparations that are 
substantially homogeneous as to HRG. As a first step, the culture medium or lysate is 
centrifuged to remove particulate cell debris. The membrane and soluble protein fractions are 

2D then separated. HRG is then purified from both the soluble protein fraction (requiring the 
presence of a protease) and from the menribrane fraction of the culture lysate, depending on 
whether HRG is membrane bound. The following procedures are exemplary of suitable 
purification procedures: fractionation on immunoaffinity or ion-exchange columns; ethanol 
precipitation; reversed phase HPLC; chromatography on silica, heparin sepharose or on a 

25 cation exchange resin such as DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate 
precipitation; and gel filtration using, for example. Sephadex G-75. 

HRG variants in which residues have been deleted, inserted or substituted are 
recovered in the same fashion as the native HRG, taking account of any substantial changes 
in properties occasioned by the variation. For example, preparation of a HRG fusion with 

30 another protein or polypeptide, e.g.. a bacterial or viral antigen, facilitates purification; an 
immunoaffinity column containing anti>ody to the antigen can be used to adsorb the fusion. 
Immunoaffinity columns such as a rabbit poiyckmal anti-HRG column can be employed to 
absorb HRG variant by binding it to at least one rerriaining immune epitope. A protease 
inhibitor such as phenylmethylsuHonylfluoride (PMSF) also may be useful to inhbit proteolytic 

35 degradation during purification, and antibbtics may be included to prevent the growth of 
adventitious contaminants. One skilled in the art will appreciate that purification methods 
suitable for native HRG may require modification to account for changes in the character of 
HRG variants upon expression in recombinant cell culture. 
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K CgygiemMptffftpatjpregfHRg 

Covalent modifications of HRG polypeptides are included within the scope of this 
invention. Both native HRG and annino acid sequence variants of HRG optionally are 
covalently nrKKfrfied. One type of covalent modification included within the scope of this 
5 invention is a HRG polypeptide fragment. HRG fragments, such as HRG-GDF, having up to 
at>out 40 amino acid residues are conveniently prepared by chemical synthesis, or by 
enzymatic or chemical cleavage of the full-length HRG polypeptide or HRG variant 
polypeptide. Other types of covalent modifications of HRG or fragments thereof are 
introduced into the molecule by reacting targeted amino acid residues of HRG or fragments 

10 thereof with an organic derivatizing agent that is capable of reacting with selected side chains 
or the N- or C-terminal residues. 

Cysteinyl residues most commonly are reacted with a-haloacetates (and 
corresponding amines), such as chloroacetic acid or <^ioroacetamide, to give carboxymethyl 
or carboxyamidomethyl derivatives. Cysteinyl residues also are derivatized by reaction with 

15 bromotrifiuoroacetone, a-bromo-p*(5-imidozoyl)propionic acid, chioroacetyl phosphate, N- 
alkylmaleimides, 3-nitro-2-pyridyl disulfide, methyl 2-pyridyl disulfide, p-chloromercuribenzoate, 
2-chloromercuri4-nitrophenol. or chloro-7-nitrobenzo-2K)xa-1.3-diazole. 

Histidyl residues are derivatized by reaction with diethylpyrocarbonate at pH 5.5-7.0 
because this agent is relatively specific for the histidyl side chain. Para-bromophenacyl 

2D bromide also is useful; the reaction is preferably performed in 0.1 M sodium cacodylate at pH 
&0. 

Lysinyl and amino temiinal residues are reacted with succinic or other carboxylic acid 
anhydrides. Derivatization with these agents has the effect of reversing the charge of the 
lysinyl residues. Other suitable reagents for derivatizing a-amino-containing residues include 
25 imidoesters such as methyl picolinimidate; pyridoxal phosphate; pyridoxal; chloroborohydride; 
trinitrobenzenesulfonic acid; O-methyiisourea; 2,4-pentanedione; and transaminase-catalyzed 
reaction with glyoxylate. 

Arginyl residues are modified by reaction with one or several conventional reagents, 
among them phenylglyoxal. 2>butanedione. 1,2-cyclohexanedione, and ninhydrin. 
30 Derivatization of arginine residues requires that the reaction be performed in alkaline 
conditbns because of the high pKg of the guanidine functional group. Furthermore, these 

reagents may react with the groups of lysine as well as the arginine epsilon-amino group. 

The specific modification of tyrosyl residues may be made, with particular interest in 
introducing spectral labels into tyrosyl residues by reaction with aromatic diazonium 
35 compounds or tetranitromethane. Most commonly, N-acetylimidizole and tetranrtromethane 
are used to form 0-acetyl tyrosyl species and 3-nitro derivatives, respectively. Tyrosyl 
residues are iodinated using 125|or131| to prepare labeled proteins for use in 
radioimmunoassay, the chloramine T method described above being suitable. 
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Cart)oxyl side groups (aspartyl or glutamyl) are selectively modified by reaction with 
cattxxJiimides (R-N=C=N-R"), where R and R' are diHerent alkyi groups, such as l-cyclohexyl- 
3-(2^rpholinyM-ethyl) cart)odiimide or l^hyl-3-(4-azonia-4.4-dimethylpentyl) cartXKjilmide. 
Furthermore, aspartyl and glutamyl residues are converted to asparaginyl and glutaminyl 

5 residues by reaction with ammonium Ions. 

Derivatization with bHunctional agents is useful for crosslinking HRG to a water- 
insoluble support matrix or surface for use in the method for purifying anti-HRG antibodies, 
and vice versa. Commonly used crosslinking agents include, e.g., 1.1-bis(dia2oacetyl)-2- 
phenylethane. glutaraldehyde, N-hydroxysuccinimide esters, for example, esters with 4- 

10 azklosalicylic ackJ. homobifuncttonal imkioeslers. including disucclnimklyl esters such as 3.3'- 
dithiobis{succinimkJylpropionate), and bifunctional maleimides such as bis-N-naleimklo-1,8- 
octane. Derivatizing agents such as methyl-3-((p-a2idophenyl)dithiolpropi'.midate yield 
photoactivatable intermediates that are capable of forming crosslinks in the presence of light. 
Alternatively, reactive water-insoluble matrices such as cyanogen bromide-activated 

15 cart)ohydrates and the reactive substrates described in U.S. Pat. Nos. 3,969,287; 3.691,016, 
4,1 95.128; 4,247,642; 4.229,537; and 4.330,440 are employed for protein immobilization. 

Glutaminyl and asparaginyl resWues are frequently deamWated to the corresponding 
glutamyl and aspartyl residues, respectively. Altematively, these residues are deamklated 
under mildly acidic conditions. Either form of these residues falls within the scope of this 

20 inventk>n. 

Other modifications include hydroxylation of proline and lysine, phosphorylation of 
hydroxyl groups of seryl or threonyl reskJues, methylation of the a-amino groups of lysine, 
arginine, and histidlne side chains (I.E. Creighton, Pmteins: Structure and Molecular 
Properties . W.H. Freeman & Co.. San Francisco, pp. 79-86 [1983]). acetylation of the N- 
25 terminal amine, and amidation of any C-temiinal cartwxyl group. 

HRG opttonaliy is fused with a polypeptkle heterologous to HRG. The heterologous 
polypeptide optionally is an anchor sequence such as that found in the decay accelerating 
system (DAF): a toxin such as ricin, pseudomonas exotoxin, getonin, or other polypeptide that 
will result in target cell death. These heterotogous polypeptkjes are covalently coupled to HRG 
30 through side chains or through the terminal reskJues. Similariy, HRG is conjugated to other 
molecules toxic or inhibitory to a target mammalian cell, e.g. such as tricothecenes, or 
antisense DNA that blocks expressbn of target genes. 

HRG also is covalently modified by altering its native glycosylation pattern. One or 
more cart)ohydrate substitutents are modified by adding, removing or varying the 
35 monosaccharide components at a given site, or by modifying residues in HRG such that 
glycosylatk>n sites are added or deleted. 

Glycosylatton of polypeptides is typkally either N-lsiked or 0-linked. N-linked refers 
to the attachment of the carbohydrate moiety to the side chain of an asparagine residue. 
The tri-peptide sequences asparagine-X-serine and asparagine-X-threonine. where X is any 
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amino acid except proline, are the recognition sequences for enzymatic attachment of the 
cart)ohydrate nx)iety to the asparagine side chain. Thus, the presence of either of these tri- 
peptide sequences in a polypeptide creates a potential glycosylation site. 0-linked 
glycosylation refers to the attachment of one of the sugars N-acetylgalactosamine, 
5 galactose, or xylose, to a hydroxyamino acid, most comnwnly serine or threonine, although 5- 
hydroxyproline or 5-hydroxylysine may also be used. 

Glycosylation sites are added to HRG by altering its amino acid sequence to contain 
one or more of the above*described trnpeptide sequences (for N-iinked glycosylation sites). 
The atteratk>n may also be nriade by the additkm of, or substitution by, one or more serine or 
10 threonine resklues to HRG (for 0-linked glycosylation sites). For ease, HRG is preferably 
altered through changes at the DNA level, partcularly by mutating the DNA encoding HRG at 
preselected bases such that codons are generated that will translate into the :'isired amino 
acids. 

Chemical or enzymatic coupling of glycosides to HRG increases the number of 

15 carbohydrate substituents. These procedures are advantageous in that they do not require 
production of the polypeptide in a host cell that is capable of N- and 0- linked glycosylation. 
Depending on the coupling mode used, the sugar(s) may be attached to (a) arginine and 
histidine. (b) free carboxy! groups, (c) free sulfhydryl groups such as those of cysteine, (d) 
free hydroxyl groups such as those of serine, threonine, or hydroxyproline. (e) aromatic 

20 residues such as those of phenylalanine, tyrosine, or tryptophan, or (f) the amkje group of 
glutamine. These methods are described in WO 87/05330, published 1 1 September 1987. and in 
Aplin and Wriston fCRC Crit. Rev. Biochem .. pp. 259-306 [1981]). 

Carbohydrate moieties present on an HRG also are removed chemically or 
enzymatically. Chemical deglycosylation requires exposure of the polypeptide to the 

25 compound trifluoromethanesulfonic acid, or an equivalent compound. This treatment results in 
the cleavage of most or all sugars except the linking sugar (N-acetylglucosamine or N- 
acetylgalactosamine), while leaving the polypeptide intact. Chemical deglycosylation is 
described by Hakimuddin et al. {Arch. Biochem, Biophys,, 259:52 [1987]) and by Edge et al. 
(Anai Biochem,, 118:131 [1981]). Carbohydrate moieties are removed from HRG by a variety 

30 of endo- and exo- glycosidases as described by Thotakura et al. {Meth. Enzymol., 138:350 
(1987]). 

Glycosylation added during expression an cells also is suppressed by tunicamycin as 
described by Duskin etal. (J. Biol Chem., 257:3105 [1982]). Tunrcamycin blocks the fomnation 
of proteh-N-glycoskle linkages. 
35 HRG also is modified by linking HRG to various nonproteinaceous polymers, e.g., 

polyethylene glycol, polypropylene glycol or polyoxyalkylenes, in the manner set forth in U.S. 
Pat. Nos. 4,640,835; 4,496,689; 4,301,144; 4,670.417; 4,791,192 or 4,179,337. 

One preferred way to increase the in wo circulating half life of non-membrane bound 
HRG is to conjugate it to a polymer that confers extended half-life, such as polyethylene 
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glycol (PEG). (Maxfield, etal, Polymer 16,505-509 [1975]; Bailey. F. E.. efa/. in Nonionic 
Surfactants [Schick, M. J., ed.] pp.794*821 [1967]; Abuchowski, A. et aL, J. Biol. Chem. 
252:3582-3586 [1977]; Abuchowski, A. ef a/., Cancer Biochem. Bbphys. 7:175-186 [1984]; Katre. 
N.V. e/a/., Proc. Natl. Acad. Sc/.. 84:1487-1491 [1987]; Goodson. R. etal. Bio Technology, 
5 8:343-346:[1990]). Conjugation to PEG also has been reported to have reduced 
invnunogenicity and toxicity (Abuchowski, A. af a/., J. Biol. Chem., 252:3578-3581 [1977]). 

HRG also is entrapped in microcapsules prepared, for example, by coacen^ation 
techniques or by interfacial polymerization (for example, hydroxymethyicellutose or gelatin- 
microcapsules and poly-(methyimethacylate] microcapsules, respectively), in colloklal drug 
10 delivery systems (for example, liposomes, albumin microspheres, microemulsions, nano- 
particles and nanocapsules). or in macroemulsions. Such techniques are disclosed in 
Remington's Phamiaceutical Sciences, leth edilton, Osol, A., Ed., (1980). 

HRG is also useful in generating antibodies, as standards in assays for HRG (e.g., by 
labeling HRG for use as a standard in a radbsnmunoassay, enzynr>e-linked immunoassay, or 
15 radioreceptor assay), in affinity purification technk]ues, and in competitive-type receptor 
binding assays when labeled with radiok>dine, enzymes, fiuorophores. spin labels, and the like. 

Those skilled in the art will be capable of screening variants in order to select the 
optimal variant for the purpose intended. For example, a change in the immunological 
character of HRG, such as a change in affinity for a given antigen or for the HER2 receptor, 
20 is nr>easured by a competitive-type immunoassay using a standard or control such as a native 
HRG (in particular native HRG-GFD). Other potential nx>dificatk)ns of protein or potypeptkJe 
properties such as redox or thermal stability, hydrophobicity, susceptibility to proteolytic 
degradation, stability in recombinant cell culture or in plasma, or the tendency to aggregate 
with carriers or into multimers are assayed by methods well known in the art. 
25 1 T?>fffflgytic tfsg Hgrggpfin Mcgy^g 

While the role of the p185H^^ and its ligands is unknown in normal cell growth and 
differentiation, it is an object of the present inventk)n to develop therapeutic uses for the 
p185H^^2 iigands of the present invention in promoting normal growth and devetopment and in 
inhibiting abnormal growth, specificalty in malignant or neoplastic tissues. 
30 2. Therapeutic Compostttons and Administration of HRG 

Therapeutic formulations of HRG or HRG antibody are prepared for storage by 
mixing the HRG protein having the desired degree of purity with optional physiologically 
acceptable carriers, excipients. or stabilizers (Remington's Pharmaceutical Sciences , supra). 
in the form of lyophilized cake or aqueous solutions. Acceptable carriers, excipients or 
35 stabilizers are nontoxk: to recipients at the dosages and concentrations emptoyed, and include 
buffers such as phosphate, citrate, and other organk; acids; antioxidants including ascorbic 
ackJ; tow molecular weight (less than about 10 residues) polypeptkJes (to prevent methoxide 
fomfialion); proteins, such as semm albumin, gelatin, or immunoglobulins; hydrophilic polymers 
such as polyvinylpyrrolidone; amino acids such as glycine, glutamine, asparagine. arginine or 
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lysine; monosaccharides, disaccharides. and other carbohydrates deluding glucose, mannose, 
or dextrins; chelating agents such as EDTA; sugar alcohols such as mannitol or sorbitol; salt- 
forming counterions such as sodium; and/or nonionic surfactants such as Tween. Pluronics or 
polyethylene glycol (PEG). 

5 HRG or HRG antibody to be used for in vivo administration must be sterile. This is 

readily accomplished by filtration through sterile filtration membranes, prior to or following 
lyophilization and reconstitution. HRG or antibody to an HRG ordinarily will be stored in 
lyophilized form or in solution. 

Therapeutic HRG. or HRG specific antibody compositions generally are placed into a 

10 container having a sterile access port, for example, an intravenous solution bag or vial having 
a stopper pierceable by a hypodemnic injection needle. 

HRG. its antibody or HRG variant when used as an antagonist may be optionally 
combined with or administered in concert with other agents known for use in the treatment of 
malignacies. When HRG is used as an agonist to stimulate the HER2 receptor, for example in 

15 tissue cultures, it may be combined with or administered in concert with other compositions 
that stimulate growth such as PDGF. FGF. EGF, growth hormone or other protein growth 
factors. 

The route of HRG or HRG antibody administration is in accord with known methods, 
e.g.. injection or infusion by intravenous, intraperitoneal, intracerebral, intramuscular, 

2D intraocular, intraarterial, or intralesional routes, or by sustained release systems as noted 
below. HRG is administered continuously by infusion or by bolus injection. HRG antibody is 
administered in the same fashion, or by administration into the blood stream or lymph. 

Suitable examples of sustained-release preparatbns include semipermeable matrices 
of solid hydrophobic polymers containing the protein, which matrices are in the form of shaped 

25 articles, e.g. films, or microcapsules. Examples of sustained-release matrices include 
polyesters, hydrogels [e.g.. poly(2-hydroxyethyl-methacrylate) as described by Langer efa/.. 
J. Biomed. Maten fles.. 15:167-277 (1981) and Langer. Chem, Tech,, 12:98-105 (1982) or 
poly(vinylalcohol)], polylactides (U.S. Pat. No. 3.773,919. EP 58,481). copolymers of L- 
glutamic acid and gamma ethyl-L-glutamate (Sklman et a/., Biopolymers, 22:547-556 [1983]), 

30 non-degradable ethylene-vinyl acetate (Langer ef a/., supra), degradable lactk; acid-gtycolic 
acid copolymers such as the Lupron Depot^^ (injectable micropheres composed of lactic acid- 
glycolic acid copolymer and leuprolkJe acetate), and poly-D-(-)-3-hydroxybutyric acid (EP 
133.988). While polymers such as ethylene-vinyl acetate and tactic acid-glycolic acid enable 
release of molecules for over 100 days, certain hydrogels release proteins for shorter time 

35 periods. When encapsulated proteins remain In the body for a long time, they may denature or 
aggregate as a result of exposure to moisture at 37*C. resulting in a toss of bk)logicai activity 
and possible changes in immunogenicity. Rational strategies can be devised for protein 
stabilization depending on the mechanism involved. For example, if the aggregation mechanism 
is discovered to be intermolecuiar S-S bond formation through *tfiio-disulfide interchange. 
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Stabilization may be achieved by modifying suKhydryl residues, lyophilizing from acidic 
solutions, controlling moisture content, using appropriate additives, and developing specific 
polymer matrix compositions. 

Sustained-release HRG or antibody compositions also include iiposomally entrapped 

5 HRG or antbody. Liposomes containing HRG or antibody are prepared by methods known per 
$e: DE 3^18.121; Epstein etaL Pfoc. Natl. Acad ScL USA 82:3688-3692 (1985); Hwang et 
a/.. Proc. Nati Acad Sci USA, 77:40304034 (1980); EP 52,322; EP 36,676; EP 88.046; EP 
143.949; EP 142.641; Japanese patent application 83-118008; U.S. Pat. No. 4.485.045 and 
4.544,545; and EP 102.324. Ordinarily the liposomes are of the small (about 200-800 

10 Angstroms) unilamelar type in which the lipid content is greater than about 30 mol. % 
cholesterol, the selected proportion being adjusted for the optinnal HRG therapy. Liposomes 
with enhanced circulation time are disclosed in U.S. Pat. No. 5.013.556. 

Another use of the present invention comprises tficorporating HRG polypeptide or 
antibody into fomned articles. Such articles can be used In modulating cellular growth and 

15 development. In addition, cell growth and division and tumor invasion may be modulated with 
these articles. 

An effective amount of HRG or antibody to be employed therapeutically will depend, 
for example, upon the therapeutic objectives, the route of administration, and the condition of 
the patient. Accordingly. It will be necessary for the therapist to titer the dosage and modify 

20 the route of administration as required to obtain the optimal therapeutic effect. A typical daily 
dosage might range from about 1 ^ig/kg to up to 100 mg/kg or more, depending on the factors 
mentioned above. Typically, the clinician will administer HRG or antibody until a dosage is 
reached that achieves the desired effect. The progress of this therapy is easily monitored by 
conventional assays. 

25 3. Hereoulin Antibodv Preparatio n and Therapeutic Use 

The antibodies of this invention are obtained by routine screening. Polyclonal 
antibodies to HRG generally are raised in animals by multiple subcutaneous (sc) or 
intraperitoneal (ip) injections of HRG and an adjuvant, tt may be useful to conjugate HRG or 
an HRG fragment containing the target amino acid sequence to a protein that is immunogenic 

30 in the species to be immunized, e.g.. keyhole limpet hemocyanin. serum albumin, bovine 
thyroglobulin, or soybean trypsin inhibitor using a bifunctional or derivatizing agent, for 
example, maleimidobenzoyl sulfosuccinimkle ester (conjugation through cysteine residues). N- 
hydroxysuccinimide (through lysine reskJues). glutaraldehyde. succinic anhydride. SOCI2. or 

r1 N = C = NR. where R and R^ are different alkyi groups. 
35 The route and schedule of immunizing an animal or removing and cutturing antibody- 

producing cells are generally in keeping with established and conventk)nal techniques for 
antibody stimulatbn and production. While mk:e are frequently immunized, it is contemplated 
that any mammalian subject including human subjects or antibody-producing cells obtained 
therefrom can be inrvnunized to generate antibody producing cells. 



wo 92/20798 




PCT/US92/04295 



44 

Subjects are typically immunized against HRG or its immunogenic conjugates or 
derivatives by combining 1 mg or 1 ^g of HRG immunogen (for rabbits or mice, respectively) 
with 3 volumes of Freund's complete adjuvant and injecting the solution intrademially at 
multiple sites. One noonth later the subjects are boosted with 1/5 to 1/10 the original amount 

5 of immunogen in Freund's complete adjuvant (or other suitable adjuvant) by subcutaneous 
injection at muttiple sites. 7 to 14 days later animals are bled and the serum is assayed for 
anti-HRG antibody titer. Subjects are boosted until the titer plateaus. Preferably, the subject 
is boosted with a conjugate of the same HRG. but conjugated to a different protein and/or 
through a different cross-linking agent. Conjugates also can be made in recombfjiant cell 

10 culture as protein fusions. Also, aggregating agents such as alum are used to enhance the 
immune response. 

After immunization, monoclonal antibodies are prepared by recovering immune 
lymphoid cells-typically spleen cells or lymphocytes from lymph node tissue-from immunized 
animals and immortalizing the cells in conventional fashion, e.g., by fusion with myeloma cells 

15 or by Epstein-Barr (EB)-virus transformation and screening for clones expressing the desired 
antibody. The hybridoma technique described originally by Kohler and Milstein. Eur, J. Immunol 
6:511 (1976) has been widely applied to produce hybrid cell lines that secrete high levels of 
monoclonal antibodies against many specific antigens. 

It is possible to fuse cells of one species with another. However, it is preferable that 

20 the source of the immunized antibody producing cells and the myeloma be from the same 
species. 

Hybridoma cell lines producing antiHRG are identified by screening the culture 
supematants for antibody which binds to HRG. This is routinely acconplished by conventional 
bnmunoassays using soluble HRG preparations or by FACS using cell-bound HRG and labelled 

25 candidate antibody. 

The hybrid cell lines can be maintained in culture in vitro in cell culture media. The cell 
lines of this invention can be selected and/or maintained in a composition comprising the 
continuous cell line in hypoxanthine-aminopterin thymidine (HAT) noedium. In fact, once the 
hybridoma cell line is established, it can be maintained on a variety of nutritionally adequate 

30 media. Moreover, the hybrid cell lines can be stored and preserved in any number of 
conventional ways, including freezing and storage under liquid nitrogen. Frozen cell lines can be 
revived and cultured indefinitely with resumed synthesis and secretion of monoclonal antibody. 
The secreted antibody is recovered from tissue culture supernatant by conventional methods 
such as precipitation, ion exchange chromatography, affinity chromatography, or the like. 

35 The antibodies described herein are also recovered from hybridoma cell cultures by 
conventbnai methods for purification of IgG or IgM as the case may be that heretofore have 
been used to purify these immunoglobulins from pooled plasma, e.g., ethanol or polyethylene 
glycol precipitation procedures. The purified antibodies are sterile filtered, and optionally are 
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conjugated to a detectable marker such as an enzyme or spin label for use in diagnostic 
assays of HRG in test samples. 

While mouse monoclonal antbodies routinely are used, the Invention is not so limited; in 
fact, human antibodies may be used and may prove to be preferable. Such antibodies can be 

5 obtained by using human hybridomas (Cote et al., Monoclonal Antibodies and Cancer 
Therapy, Alan R. Liss, p. 77 (1985)). Chimeric antibodies, Cabilly et al.. (Morrison et al, Proc. 
Natl. Acad Sci., 81:6851 (1984); Neuberger et al.. Nature 312.-604 (1984); Takeda etal., 
Nature 314:452 (1985)) containing a murine anti-HRG variable region and a human constant 
regton of appropriate biological activity (such as ability to activate human complement and 

10 mediate ADCC) are within the scope of this invention, as are humanized anti-HRG 
antibodiesproduced by conventional CRD-grafting methods. 

Technk)ues for creating recombinant DNA versions of the 
antigen-binding regions of antibody molecules (known as Fab or variable regions fragments) 
which bypass the generatbn of monoctonal antibodies are encompassed within the practice of 

15 this invention. One extracts antibody-specific messenger RNA molecules from immune 
system cells taken from an immunized subject, transcribes these into complementary DNA 
(cDNA), and ctones the cDNA Into a bacterial expression system and selects for the desired 
binding characteristic. The Scripps/Stratagene method uses a bacteriophage lambda vector 
system containing a leader sequence that causes the expressed Fab protein to migrate to the 

20 peripiasmic space (between the bacterial cell membrane and the cell wall) or to be secreted. 
One can rapidly generate and screen great numbers of functional Fab fragments to klentify 
those whk:h bind HRG with the desired characteristics. 

Antibodies specific to HRG-a, HRG-pl, HRG-p2 and HRG-p3 may be produced and 
used in the manner described above. HRG-a, HRG-pl, HRG-p2 and .HRG-p3 specific 

25 antibodies of this invention preferably do not cross-react with other members of the EGF 
family (Fig. 6) or with each other. 

Antibodies capable of specifically binding to the HRG-NTD, HRG-GFD or HRG-CTP 
are of particular interest. Also of interest are antibodies capable of specifically binding to the 
proteolytic processing sites between the GFO and transmembrane domains. These antibodies 

30 are identified by methods that are conventional per se. For example, a bank of candidate 
antibodies capable of binding to HRG-ECD or proHRG are obtained by the above methods 
using immunization with full proHRG. These can then be subdivkled by their ability to bind to 
the various HRG domains using conventnnal mappng techniques. Less preferably, antibodies 
specific for a predetemiined domain are initially raised by immunizing the subject wKh a 

35 polypeptkle comprising substantially only the domain in question, e.g. HRG-GFD free of NTD or 
CTP polypeptkJes. These antibodies will not require mapping unless binding to a particular 
epitope is desired. 

Antibodies that are capable of binding to proteolytic processing sites are of particular 
interest. They are produced either by immunizing with an HRG fragment that includes the 
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CTP processing site, with intact HRG, or with HRG-NTD-GFD and then screening for the 
ability to block or inhibit proteolytic processing of HRG into the NTO-GFO fragoient by 
recombinant host cells or isolated cell lines that are otherwise capable of processing HRG to 
the fragment. These antibodies are useful for suppressing the release of NTD-GFD and 
5 therefore are promising for use in preventing the release of NTCMBFD and stimulation of the 
HER*2 receptor. They also are useful in controlling cell growth and replication. Anti-GFD 
antibodies are useful for the same reasons, but may not be as efficient biologically as 
antibodies directed against a processing site. 

Antibodies are selected that are capable of bffiding only to one of the nnembers of the 

10 HRG family, e.g. HRG-atpha or any one of the HRG-beta isoforms. Since each of the HRG 
family members has a distinct GFD-transmembrane domain cleavage site, antibodies directed 
specifically against these unique sequences will enable the highly specific inhibition of each of 
the GFOs or processing sites, and thereby refine the desired biological response. For example, 
breast carcinon^a cells which are HER-2 dependent may In fact be activated only by a single 

15 GFD isotype or, if not, the activating GFD may originate only from a particular processing 
sequence, either on the HER-2 bearing cell itself or on a GFD-generating cell. The identificatbn 
of the target activating GFD or processing site is a straight-forward matter of analyzing 
HER-2 dependent carcinomas, e.g., by analyzing the tissues for the presence of a particular 
GFD family member associated with the receptor, or by analyzing the tissues for expression 

20 of an HRG family member (which then would serve as the therapeutic target). These 
selective antibodies are produced in the same fashion as described above, either by 
onmunization with the target sequence or domain, or by selecting from a bank of antibodies 
having broader specificity. 

As described above, the antibodies shoukl have high specifk:ity and affinity for the 

25 target sequence. For example, the antibodies directed against GFD sequences should have 
greater affinity for the GFD than GFD has for the HER-2 receptor. Such antibodies are 
selected by routine screening methods. 

4. NPfvThgrgpgirtig Uses gf Hff?wiin god it? AntiWte? 

The nucleic acid encoding HRG may be used as a diagnostk: for tissue specific typing. 

30 For example, such procedures as in situ hybridization, and Northem and Southern btotting, and 
PGR analysis may be used to determine whether DNA and/or RNA encoding HRG are 
present in the cell type(s) being evaluated. In partk:uiar. the nuciek: acid may be useful as a 
specific probe for certain types of tumor cells such as, for example, nnammary gland, gastric 
and colon adenocarcinomas, salivary gland and other tissues containing the pIBS^^^. 

35 Isolated HRG may be used in quantitative diagnostic assays as a standard or control 

against which samples containing unknown quantities of HRG may be compared. 

Isolated HRG may be used as a growth factor for invitro cell culture, and invivo to 
promote the growth of cells containing p^B5^^^^ or other anatogous receptors. 
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HRG antibodies are useful in diagnostic assays for HRQ expression in specific cells or 
tissues. The antibodies are labeled in the same fashion as HRG described above and/or are 
immobilized on an insoluble matrix. 

HRG antibodies also are useful for the affinity purification of HRG from recombinant 
5 ceil culture or natural sources. HRG antibodies that do not detectably cross-react with other 
HRQ can be used to purify HRG free from other known ligands or contaminating protein. 

Suitable diagnostic assays for HRG and its antibodies are well known per se. Such 
assays include competitive and sandwich assays, and steric inhbition assays. Competitive 
and sandwich methods employ a phase-separation step as an integral part of the method 
10 while steric inhibition assays are conducted in a Sffigle reaction mbcture. Fundamentally, the 
same procedures are used for the assay of HRG and for substances that bind HRG, although 
certain methods will be favored depending upon the nx)iecular weight of the substance being 
assayed. Therefore, the substance to be tested is referred to herein as an analyte. 
irespective of its status othenvise as an antigen or antibody, and proteins that bind to the 
15 anatyte are denominated binding partners, whether they be antibodies, cell surface receptors, 
or antigens. 

Analytical methods for HRG or its antibodies all use one or more of the following 
reagents: labeled analyte anatogue, immobilized anatyte analogue, labeled binding partner, 
immobilized binding partner and steric conjugates. The labeled reagents also are known as 
20 "tracers." 

The label used (and this is also useful to label HRG encoding nucleic acid for use as a 
probe) is any detectable functionality that does not interfere with the binding of analyte and 
its binding partner. Numerous labels are known for use in immunoassay, examples including 
moieties that may be detected directly, such as fluorochrome, chemiluminescent, and 

25 radioactive labels, as well as moieties, such as enzymes, that must be reacted or derivatized 
to be detected. Examples of such labels include the radioisotopes 32p, ''^C, 125|^ 3h, and 
'1 31 1, fluorophores such as rare earth chelates or fluorescein and its derivatives, rt)odamine 
and its derivatives, dansyl. umbelliferone, luciferases, e.g., firefly luciferase and bacterial 
luciferase (U.S. Pat. No. 4,737.456), luciferin, 2,3-dihydrophthalazinediones, horseradish 

30 peroxidase (HRP), alkaline phosphatase, p-galactosidase. glucoamylase, lysozyme. 
saccharide oxidases, e.g., glucose oxidase, galactose oxidase, and glucose-6-phosphate 
dehydrogenase, heterocyclic oxidases such as uricase and xanthine oxidase, coupled with an 
enzyme that employs hydrogen peroxide to oxidize a dye precursor such as HRP, 
lactoperoxidase, or microperoxidase, biotin/avidin, spin labels, bacteriophage labels, stable 

35 free radicals, and the like. 

Conventional methods are available to bind these labels covalently to proteins or 
polypeptides. For instance, coupling agents such as diaklehydes, carbodiimides, dimaleimides, 
bis-imklates, bis-diazotized benzkiine. and the like may be used to tag the antibodies with the 
aboveKiescribed fluorescent, chemilumffiescent. and enzyme labels. See, for example. U.S. 
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Pat. Nos. 3,940,475 (fluorimetry) and 3.645,090 (enzymes); Hunter a/ a/., Nature, 144:945 
(1962); David efa/.. Biochemistry, 13:1014-1021 (1974); Pain ef a/., J. Irrmunol. Methods, 
40:219-230 (1981); and Nygren, J. Histoohem, and Cytochem., 30:407-412 (1982). Preferred 
labels herein are enzymes such as horseradish peroxidase and alkaline phosphatase. The 
5 conjugation of such label, including the enzymes, to the antibody is a standard man^ulatrve 
procedure for one of ordinary skill in immunoassay technkjues. See, for example. 0*Sullivan et 
a/.. 'Methods for the Preparation of Enzyme-antibody Conjugates for Use in Enzyme 
Immunoassay.' in Methods in Enzymolopv . ed. JJ. Langone and H. Van Vunakis. Vol. 73 
(Academic Press, New York. New York. 1981). pp. 147-166. Such bonding niethods are 

10 suitable for use with HRG or its antibodies, all of which are proteinaceous. 

Immobilization of reagents is required for certain assay methods. Immobilization 
entails separating the binding partner from any analyte that remains free in solution. This 
conventbnally is accomplished by either insolubilizing the binding partner or analyte analogue 
before the assay procedure, as by adsorption to a water-insoluble matrix or surface (Bennich 

15 at al., U.S. Pat. No. 3,720.760), by covalent coupling (for example, using glutaraldehyde cross- 
linking), or by insolubilizing the partner or analogue aftenvard, e.g., by immunoprecipitatton. 

Other assay methods, known as competitive or sandwich assays, are well 
established and wkJely used in the commercial diagno5tk:s industry. 

(Competitive assays rely on the ability of a tracer analogue to compete with the test 

2D sample analyte for a limited number of binding sites on a common binding partner. The binding 
partner generally is insolubilized before or after the competition and then the tracer and 
analyte bound to the binding partner are separated from the unbound tracer and analyte. 
This separatbn is accomplished by decanting (where the binding partner was preinsolubilized) 
or by centrifuging (where the binding partner was precipitated after the competitive reaction). 

25 The anfX)unt of test sample analyte is inversely proportional to the amount of bound tracer as 
measured by the amount of marker substance. Dose-response curves with known amounts of 
analyte are prepared and compared with the test results to quantitatively determine the 
amount of analyte present in the test sample. These assays are called EUSA systems when 
enzymes are used as the detectable markers. 

30 Another species of competitive assay, called a 'homogeneous' assay, does not 

require a phase separation. Here, a conjugate of an enzyme with the analyte is prepared and 
used such that when anti-analyte binds to the analyte the presence of the anti-analyte 
modifies the enzyme activity. In this case, HRG or its immunologically active fragments are 
conjugated with a bifunctional organic bridge to an enzyme such as peroxidase. Conjugates 

35 are selected for use with anli-HRG so that binding of the anti-HRG antibody inhibits or 
potentiates the enzyme activity of the label. This method perse is wklely practiced under the 
name of EMIT, 

Steric conjugates are used in steric hindrance methods for homogeneous assay. 
These conjugates are synthesized by covalently linking a low-molecular-weight hapten to a 
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small analyte so that antibody to hapten substantially is unable to bind the conjugate at the 
same time as anti*anatyte. Under this assay procedure the analyte present in the test 
sample will bind anti-anaiyte, thereby allowing anti-hapten to bind the conjugate, resulting in a 
change in the character of the conjugate hapten, e.g., a change in fluorescence when the 
5 hapten is a fluorophore. 

Sandwich assays particularly are useful for the determination of HRG or HRG 
antibodies. In sequential sandwich assays an imnx)biiized binding partner is used to adsorb 
test sample analyte, the test sample is removed as by washing, the bound analyte is used to 
adsorb labeled binding partner, and bound material is then separated from residual tracer. 

10 The amount of bound tracer is directly proportional to test sample analyte. In 'simultaneous' 
sandwich assays the test sample is not separated before adding the labeled binding partner. 
A sequential sandwich assay using an anti-HRG monoclonal antibody as one a ^ibody and a 
polyclonal anti-HRG antbody as the other is useful in testing samples for HRG activity. 

The foregoing are merely exemplary diagnostic assays for HRG and antibodies. 

15 Other methods now or hereafter developed for the detennination of these analytes are 
included within the scope hereof, including the bioassays described above. 

HRG polypeptides may be used for affinity purification of receptors such as the 
p185HER2 and other similar receptors that have a binding affinity for HRG, and more 
specifically HRG-a. HRG-pl, HRG-p2 and HRG-p3. HRG-a, HRG-pl, HRG-32 and HRG- 

2D p3 may be used to fomn fusion polypeptides wherein HRG portion is useful for affffiity binding 
to nucleic acids and to heparin. 

HRG polypeptides may be used as ligands for competitive screening of potential 
agonists or antagonists for binding to pISSHE'^^. HRG variants are useful as standards or 
controls in assays for HRG provided that they are recognized by the analytical system 

25 employed, e.g. an anti-HRG antibody. Antibody capable of binding to denatured HRG or a 
fragment thereof, is employed in assays in which HRG \s denatured prior to assay, and in this 
assay the denatured HRG or fragment is used as a standard or control. Preferably, HRG-a, 
HRG-(3l, HRG-32 and HRG-fl3 are detectabty labelled and a competition assay for bound 
p185HER2 is conducted using standard assay procedures. 

30 The methods and procedures descrbed herein with HRG-a may be applied similarly to 

HRG-pi, HRG-^ and HRG-p3 and to other novel HRG ligands and to their variants. The 
following examples are offered by way of illustration and not by way of limitation. 

EXAMPLES 

35 

Examplel 

Preparation of Breast Cancer Cell Suoematants 
Heregulin-a was isolated from the supernatant of the human breast carcinoma MDA- 
MB-231 . HRG was released into and isolated from the cell culture medium. 
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a £eIL£uliU£e 

MDA-MB-231. human breast carcinoma cells, obtainable from the American Type 
Culture Collection (ATCC HTB 26), were initially scaled-up from 25 Cft^ tissue culture flasks 
to 890 cm2 plastic roller bottles (Coming, Coming, N Y) by serial passaging and the seed train 

5 was maintained at the roller bottle scale. To passage the cells and maintain the seed train, 
flasks and roller bottles were first rinsed with phosphate buffered saline (PBS) and then 
incubated with trypsin/EDTA (Signrta, St. Louis, Mo) for 1*3 minutes at 37''C. The detached 
cells were then pipetted several times in fresh culture medium containing fetal bovine semm 
(FES), (Gibco, Grand Island, NY) to break up cell clumps and to inactivate the trypsin. The 

10 cells were finally split at a ratio of 1:10 into fresh medium, transferred unto new flasks or 
bottles, incubated at 37^C, and allowed to grow until nearly confluent. The growth medium in 
which the cells were maintained was a combined DME/Ham's-F-12 medi ' i formulation 
noodlfied with respect to the concentratk)ns of some amino acids, vitamins, sugars, and salts, 
and supplemented with 5% FES. The same basal medium is used for the serunrvfree ligand 

15 production and is supplemented with 0.5% Primatone RL (Sheffteki, Norwk:h, NY). 

b LyqgSwigPredwgtfen 

Large scale MDA-MB-231 cell growth was obtained by using Perceli Biolytica 
microcaniers (Hyclone Laboratories, Logan, UT) n^de of weighted cross-linked gelatin. The 
microcaniers were first hydrated. autoclaved, and rinsed according to the manufacturer's 

20 recommendatk)ns. Cells from 10 roller bottles were trypsinized and added into an inoculatk)n 
spinner vessel which contained three liters of growth medium and 10-20 g of hydrated 
microcamers. The cells were stinred gently for about one hour and transfen^ed Ho a ten-liter 
instmmented fermenter containing seven liters of growth medium. The culture was agitated 
at 65-75 rpm to maintain the microcaniers in suspensk)n. The fermenter was controlled at 

25 37°C and the pH was niaintained at 7.0-7.2 by the addition of sodium carbonate and CO2. Air 
and oxygen gases were sparged to maintain the culture at about 40% of air saturation. The 
cell population was monitored microscopically with a fluorescent vital stain (fluorescein 
diacetate) and compared to trypan blue stalling to assess the relative cell viability and the 
degree of microcarrier invask)n by the cells. Changes in cellnnicrocarrier aggregate size were 

30 monitored by mk:roscopic photography. 

Once the microcamers appeared 90-100% confluent, the culture was washed with 
serum-free medium to renx)ve the semm. This was accomplished by stopping the agitatbn 
and other controls to altow the earners to settle to the bottom of the vessel. Approximately 
nine liters of the culture supernatant were pumped out of the vessel and replaced with an 

35 equal volume of serum-free medium (the same basal medium described as above supplemented 
either with or without Primatone RL). The mlcrocarriers were briefly resuspended and the 
process was repeated until a 1000 fold removal of FBS was achieved. The cells were then 
incubated in the serum-free medium for 3-5 days. The glucose concentratbn In the culture 
was monitored daily and supplemented with additions of glucose as needed to maintain the 
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concentration in the fermenter at or above 1 g/L. At the tin»e of harvest, the microcarriers 
were settled as described above and the supernatant was aseptically removed and stored at 
2-8*C for purification. Fresh serum-free medium was replaced into the fermenter, the 
microcarriers were resuspended, and the culture was incubated and harvested as before. 
5 This procedure could be repealed four times. 

Exaniple2 
Purlfieatlon of Gro wth Paetor Aeth>Hv 
Conditioned media (10-20 liters) from MDA-MB-231 cells was clarified by 

10 centrifugation at 10,000 rpm in a Sorvall Centrifuge, filtered through a 022 micron filter and 
then concentrated 10-50 (approx. 25) fold with a Minitan Tangential Flow Unit (Miliipore 
Corp.) with a 10 kDa cutoff polysuHone membrane at room temperature. Altematively, 
media was concentrated with a 2.5L Amicon Stirred Cell at 40C with a YM3 membrane. 
After concentration, the media was again centrifuged at 10,000 rpm and the supernatant 

15 frozen in 35-50 ml aliquots at -SOoC. 

Heparin Sepharose was purchased from Pharmacia (Piscataway, NJ) and was 
prepared according to the directions of the manufacturer. Five milliliter^ of the resin was 
packed into a column and was extensively washed (100 column volumes) and equilibrated with 
phosphate buffered saline (PBS). The concentrated conditioned media was thawed, filtered 

20 through a 0.22 micron fitter to remove particulate material and loaded onto the heparin- 
Sepharose column at a flow rate of 1 ml / min. The nonral load consisted of 30-50 mis of 40- 
fold concentrated media. After loading, the column was washed with PBS until the 
absorbance at 280 nm retumed to baseline before elution of protein was begun. The column 
was eluted at 1 ml/min with successive salt steps of 0.3 M, 0.6 M, 0.9 M and (optionally) 2.0 M 

25 NaCI prepared in PBS. Each step was continued until the absorbance retumed to baseline, 
usually 6-10 column volumes. Fractions of 1 milliliter volume were collected. All of the 
fractions corresponding to each wash or salt step were pooled and stored for subsequent 
assay in the MDA-MB-453 cell assay. 

The majority of the tyrosine phosphorylation stimulatory activity was found in the 

30 0.6M NaCI pool which was used for the next step of purification. Active fractions from the 
heparin-Sepharose chromatography were thawed, diluted three fold with deionized (MilliO) 
water to reduce the salt concentration and loaded onto a polyaspartic acid column (PolyCAT 
A, 4.6 X 100 mm, PolyLC, Columbia, MD.) equilibrated in 17 ml^ Na phosphate, pH 6.8. All 
buffers for this purification step contained 30% ethanol to improve the resolution of protein on 

35 this column. After loading, the column was washed with equilibration buffer and was eluted 
with a linear sah gradient from 0.3 M to 0.6 M NaCI in 17 mM Na phosphate, pH 6.8. butter. 
The column was loaded and developed at 1 mt/min and 1 ml fractions were collected during the 
gradient elution. Fractions were stored at A^C. Multiple heparin-Sepharose and PolyCat 
columns were processed in order to obtain suHident material for the next purification step. A 
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typical absorbance profile from a PolyCat A column is shown in Figure 1. Aliquots of 10-25 
were taken from each fraction for assay and SDS gel analysis. 

Tyrosine phosphorylation stimulatory activity was found throughout the eluted 
fractions of the PolyCAT A column with a nfiajority of the activity found in the fractions 
5 corresponding to peak C of the chromatogram (salt concentration of approximately 0.45M 
NaCI). These fractions were pooled and adjusted to 0.1% trifluoracetic acid (TFA) by 
addition of 0.1 volume of 1% TFA. Two volumes of debnized water were added to dilute the 
ethanol and salt from the previous step and the sample was subjected to further purificatk)n 
on high pressure liquid chromatography (HPLC) utilizing a C4 reversed phase column 

10 (SynChropak RP-4. 4.6 xlOO nun) equilbrated in a buffer consisting of 0.1% TFA in water 
with 15% acetonitriie. The HPLC procedure was carried out at room temperature with a 
flow rate of 1 ml/min. After loading of the sample, the column was re-equilibrated in 0.1% 
TFA/15% acetonitriie. A gradient of acetonitrile was established such that over a 10 minute 
period of time the acetonitrile concentration Increased from 15 to 25% (r/o/min). 

15 Subsequently, the column was developed with a gradient from 25 to 40% acetonitrile over 60 
min time (0.25%/min). Fractions of 1 ml were collected, capped to prevent evaporation, and 
stored at 4^0. Aliquots of 10 to 50 were taken, reduced to dryness under vacuum 
(SpeedVac), and reconstituted with assay buffer (PBS with 0.1% bovine serum albumin) for 
the tyrosine phosphorylation assay. Additbnally. aik|uots of 10 to 50 were taken and dried 

20 as above for analysis by SDS gel electrophoresis. A typk:al HPLC profile is shown in Figure 
2 

A major peak of activity was found in fraction 17 (Figure 2B). By SDS gel analysis, 
fraction 17 was found to contain a single major protein species which comigrated with the 
45.000 datton nK)leculdr weight standard (Figs. 2C. 3). In other preparations, the presence of 

25 the 45.000 datton protein comigrated with the stimulation of tyrosine phosphorylation activity 
in the MDA-MB-453 cell assay. The chromatographic properties of the 45,000 dalton protein 
were atypk:al; in contrast to many other proteins in the preparation, the 45.000 datton protein 
did not elute from the reversed phase column within 2 or 3 fractbns. Instead, it was eluted 
over 5-10 fractbns. This is possibly due to extensive post-translatk)nal modificatbns. 

30 a. Protein Sequence Determination 

Fractions containing the 45.000 dalton protein were dried under vacuum for amino acid 
sequencing. Samples were redissolved in 70% formic acid and loaded into an Applied 
Biosystems. Inc. Model 470A vapor phase sequencer for N-temiinal sequencing. No 
discemable N-terminai sequence was obtained, suggesting that the N-terminal residue was 

35 blocked. Similar results were obtained when the protein was first run on an SDS gel. 
transbtotted to ProBlott membrane and the 45.000 datton band excised after localization by 
rapid staining with Coomassie Brilliant Blue. 

Internal amino acid sequence was obtained by subjecting fractions containing the 
45,000 datton protein to partial digestion using either cyanogen bromide, to cleave at 
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methionine residues. Lysine-C to cleave at the C-terminal skJe of lysine residues, or Asp-N to 
cleave at the N-terminai side of aspartic acid residues. Samples after digestion were 
sequenced directly or the peptides were first resolved by HPLC chromatography on a 
Synchrom C4 column (4000A. 2 x 100 mm) equilibrated in 0.1% TFA and eluted with a 1- 
5 propanol gradient in 0.1% TFA. Peaks from the chromatographic run were dried under 
vacuum before sequencing. 

Upon sequencing of the peptide in the peak designated number 15 (lysine C-15), 
several amino acids were found on each cycle of the run. After careful analysis, it was clear 
that the fractk>n contained the same bask: peptide with several different N-termini, giving rise 
10 to the multiple amino acids in each cycle. After deconvolution, the following sequence was 
determined (SEQ IDN0.3): 

IA]AEKEKTFIC]VNG6EXFMVKDLXNP 
1 5 10 15 20 
(Rescues in brackets were uncertain while an X represents a cycle In which rt was not 
15 possible to identify the amino ackl.) 

The initial yield was 8.5 pmoles. This sequence comprising 24 amino acids did not 
coH'espond to any previously known protein. Resklue 1 was later found from the cDNA 
sequence to be Cys and residue 9 was found to be correct. The unknown amino ackis at 
positions 15 and 22 were found to be Cys and Cys. respectively. 
20 Sequencing on samples after cyanogen bromide and Asp-N digestk>ns. but without 

separation by HPLC. were performed to corroborate the cDNA sequence. The sequences 
obtained are given in Table I and confirm the sequence for the 45.000 protein deduced from the 
cDNA sequence. The N-terminal of the protein appears to be blocked with an unknown 
bk>cking group. On one occasion, direct sequencing of the 45,000 datton band from a PVDF 
25 bk)t revealed this sequence with a very small initial yield {02 pmole)(SEQ ID N0:4): 
X E X K E (G) (R) G K (G) K (G) K K K E X G X G (K) 
(Residues which could not be determined are represented by 'X'. while tentative residues are 
in parentheses). This corresponds to a sequence starting at the serine at positbn 46 near the 
present N-terminal of HRG cDNA sequence; this suggests that the N terminus of the 45.000 
30 protein is at or before this point in the sequence. 

Examples 

CloninQ and SeouencinQ d Human Hereoufin 
The cDNA ctoning of the pi 85^^^ ligand was acconplished as follows. A portion of 
the lysine C-15 peptkJe amino acid sequence was decoded in order to design a probe for 
35 cDNA's encoding the 45kD HRG*a ligand. The foitowing 39 reskJue bng eight fold degenerate 
deoxyoligonucleotide corresponding to the amino acid sequence(SEQ ID N0:5) NH2* 
...AEKEKTFXVNGGE was chemk:alty synthesized (SEQ ID N0:6): 
3' GCTGAGAAGGAGAAGACCTTCTGT/CGTGAAT/CGGA/CGGCGAG 5'. 
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The iffiknown amino acid residue designated by X h the amino acid sequence was assigned as 
cysteine for design of the probe. This probe was radioactively phosphorylated and employed 
to screen by low stringency hybridization an oiigo cfT primed cDNA library constmcted from 
human MDA-MB-231 cell mRNA in XgtIO (Huyng ef a/.. 1984. In DNA Cloning. Vol 1: A 
5 Practical Approach (0. Glover, ed) pp.49-78. IRL Press, Oxford). Two positive clones 
designated XgtlOherlG and Xgt10her13 were identified. DNA sequence analysis revealed that 
these two clones were identical. 

The 2010 basepair cDNA nucleotide sequence of XgtlOherie (Fig. 4) contains a single 
long open reading frame of 669 amino acids beginning with alanine at nucleotide positions 3-5 

10 and ending with glutamine at nucleotide positions 2007*2009. No stop codon was found in the 
translated sequence; however, later analysis of heregulin ^-type clones indicates that 
methionine encoded at nucleotide positions 135-137 was the initiathg methionine. Nucleotide 
sequence honDOlogy with the probe is found between and including bases 681-719. Homology 
between those amino acids encoded by the probe and those flanking the probe with the amino 

15 acid sequence determined for the lysine C-15 fragment verify that the isolated clone encodes 
at least the lysine C-15 fragment of the 45kD protein. 

Hydropathy analysis shows the existence of a strongly hydrophobic amino acid region 
including residues 287-309 (Fig. 4) indicating that this protein contains a transmembrane or 
internal signal sequence domain and thus is anchored to the membrane of the cell. 

20 The 669 amino acid sequence encoded by the 201 Obp cDNA sequence contains 

potential sites for asparagine-linked glycosylation (Winzler.R. in Hormonal Proteins and 
Peptkles. ( Li, C.H. ed ) pp 1-15 Academic Press, New York (1973)) at posittons asparagine 
164, 170, 208, 437 and 609. A potential 0-glycosylatk)n site (Marshal!,R.D. (1974) Bbchem. 
Soc. Symp. 40:17-26) is presented in the region including a cluster of serine and threonine 

25 residues at amino acid positions 209-218. Three sites of potential gtycosaminoglycan addition 
(Goldstein, L.A., etal. (1989) Cell 56:1063-1072) are positioned at the serine-glycine dipeptides 
occuntig at amino acids 42-43, 64-65 and 151-152. Glycosylatbn probably accounts for the 
discrepancy between the cak:ulated NW of about 26KD for the NTD-GFD (extracellular) 
region of HRG and the observed NW of about 45 KD for purified HRG. 

30 This amino acid sequence shares a number of features with the epidemial growth 

factor (EGF) family of transmembrane bound growth factors (Carpenter,G., and Cohen,S. 
(1979) Ann. Rev. Bk>chem.48: 193-216; Massenque. J.{1990) J. Biol. Chem. 265: 21393-21396) 
including 1) the existence of a preform of each growth factor from whfch the mature form is 
proteolytically released (Gray, A., Dull, TJ., and Ullrich, A. (1983) Nature 303, 722-725; Bell. 

35 G.I. etaL, (1986) Nuc. Acid Res., 14: 8427-8446; Derynck, R. etai (1984) Cell: 287-297); 2) 
the conservation of six cysteine residues characteristically positioned over a span of 
approximately 40 amino acids (the EGF-like structural motif) (Savage.R.C, BtaL (1973) J. 
Biol. Chem. 248: 7669-7672); HRG-a cysteines 226, 234. 240, 254, 256 and 265 ); and. 3) the 
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existence of a transmembrane domain occurring proxiniaily on the carboxy-terminal side of 
the EGF honfK>bgous region (Fig. 4 and 6). 

Alignment of the amino acid sequences in the region of the EGF motif and flanking 
transmembrane domain of several human EGF related proteins (Rg. 6) shows that between 
5 the first and sixth cysteine of the EGF motif HRG Is most similar (50%) to the heparin binding 
EGF-like growth factor (HB-EGF) (Higashi/ama, S. etal. (1991) Science 251: 936-939). In 
this same region HRG is -35% homologous to amphiregulin (AR) (Ptowman, GL.etai, (1990) 
Mol. Cell. Bk>l. 10: 1969-1981), -32% homologous to transfomiing growth factor a (TGF a) 
(8). 27% homologous with EGF (Bell. G.I. etal., (1986) Nuc. AcM Res., 14: 8427-8446); and 

10 39% homologous to the schwanoma-derived growth factor (Kimura, H., et al, Nature, 
348:257-260, 1990). Disulfkle linkages between cysteine resklues in the EGF motif have been 
determined for EGF (Savage, R.C. etal. (1973) J. Biol. Chem. 248: 7669-7672). These 
disulfides define the secondary structure of this region and demarcate three loops. By 
numbering the cysteines beginning with 1 on the amino-terminal end, loop 1 is delineated by 

15 cysteines 1 and 3; loop 2 by cysteines 2 and 4; and kx>p 3 by cysteines 5 and 6. Although the 
exact disulfide configuration in the regnn for the other members of the family has not been 
detennined, the strict conservation of the six cysteines, as well as several other reskiues i.e., 
glycine 238 and 262 and arginine at positton 264, indk^ate that they too most likely have the 
same arrangement. HRG-a and EGF both have 13 amino acids in k>op 1. HB-EGF, 

20 amphregulin (AR) and TGF a have 1 2 amino ackjs in k>op 1 . Each member has 1 0 resklues in 
k>op 2 except HRG-a which has 13. All five members have 8 resklues in the third toop. 

EGF, AR, HB-EGF and TGF-a are all newly synthesized as membrane anchored 
proteins by virtue of their transmembrane domains. The proproteins are subsequently 
processed to yiekl mature active molecules. In the case of TGF-a there is evklence that the 

25 membrane associated proforms of the molecules are also biologically active (Brachmann, 
R.,e/ al. (1989) Cell 56: 691-700), a trait that may also be the case for HRG-a. EGF is 
synthesized as a 1 168 amino ackl transmembrane bound proEGF that is cleaved on the amino- 
tenninal end between arginine 970 and asparagine 971 and at the carboxy-terminal end 
between arginine 1023 and histidine 1024 (Carpenter.G., and Cohen,S. (1979) Ann. Rev. 

30 Bk>chem.48: 193-216) to yiekl the 53 amino acid mature EGF molecule containing the three 
k)op, 3 disulfide bond signature structure. The 252 amino ackl proAR is cleaved between 
aspartk: acid 100 and serine 101 and between lysine 184 and serine 185 to yield an 84 amino 
acid form of mature AR and a 78 amino acid form is generated by NH2-terminal cleavage 
between glutamine 106 and valine 107 (Ptowman. G.D. etal.. (1990) Mol. Cell. Biol. 10: 1969- 

35 1981). HB-EGF is processed from its 208 amino acid primary translation product to its 
proposed 84 amino acid form by cleavage between arginine 73 and valine 74 and a second site 
approxinnately 84 amino ackls away in the carboxy-termffial directk>n (Higashiyama, S., et 
al., and Klagsbum, M. (1991) Science 251: 936-939). The 160 amino ackl profomi of TGF a is 
processed to a mature 50 amino ackl proteff) by cleavages between alanine 39 and valine 40 
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on one side and downstream cleavage between alanine 89 and valine 90 (Derynck et a/., 
(1984) Cell: 38: 287-297). For each of the above described molecules COOH-terminal 
processing occurs in the area bounded by the sixth cysteine of the EGF motif and the 
beginning of the transmembrane domain. 

5 The residues between the first and sixth cysteines of HRGs are most similar (45%) 

to heparin-binding EGF-like growth factor (HB-EGF). In this same region they are 35% 
identical to amphiregulin (AR). 32% identical to TGF-a, and 27% identical with EGF. Outside 
of the EGF motif there is little similarity between HRGs and other members of the EGF 
family. EGF. AR, HB-EGF and TGF-a are all derived from membrane anchored proproteins 

10 whk:h are processed on both skies of the EGF stnictural unit. yiekJing 50-84 amino ackj mature 
proteins (16-19). Lace other EGF family members, the HRGs appear to be derived from a 
membrane-bound preform but require only a single cleavage, G-temninal to the cysteine 
cluster, to produce mature protein. 

HRG may exert its biotogical function by binding to its receptor and triggering the 

15 transduction of a growth modulating signal. This it n^y accomplish as a soluble molecule or 
pertiaps as its membrane anchored fomri such as is sometimes the case with TGF a 
(Brachmann, R., et al.. (1989) Cell 56: 691-700). Conversely, or in addition to stimulating 
signal transduction, HRG may be internalized by a target cell where it may then interact with 
the controlling regions of other regulatory genes and thus directly deliver its message to the 

20 nucleus of the cell. The possbility that HRG mediates some of its effects by a mechanism 
such as this is suggested by the fact that a potential nuclear tocation signal (Roberts. 
Biochem-Biophys Acta (1989) 1008: 283-280) exists in the region around the three lysine 
residues at posittons 58-60 (Fig. 4). 

The isolation of full-length cDNA of HRG-a is accomplished by employing the DNA 

25 sequence of Fig 4 to select additk>nal cDNA sequences from the cDNA Ibrary constructed 
from human MDA-MB-231. Full-length cDNA ctones encoding HRG-a are obtained by 
klentifying cDNAs encoding HRG-a k)nger in both the 3' and 5' directtons and then spicing 
together a composite of the different cDNAs. Additk)nal cONA libraries are constnided as 
required for this purpose. Following are three types of cDNA libraries that may be 

30 constructed: 1) Oligo-dT primed where predominately stretches of polyadenosine reskjues are 
primed, 2) random primed using short synthetic deoxyoligonucleotkles non-specific for any 
particular region of the mRNA, and 3) specifically primed using short synthetic 
deoxyoligonucleotkles spec'tfic for a desired regk)n of the mRNA. Methods for the isolatk>n of 
such cDNA libraries were previously described. 

35 Example4 

Detection of HRG-ot mRNA Exoressten bv Northern Analyses 

Northern blot analysis of MDA-MB-231 and SK-BR-3 cell mRNA under high stringency 
conditions shows at least five hybridizing bands in MDA-MB-231 mRNA where a 6.4Kb band 
predominates: other weaker bands are at 9.4, 6.9. 2.8 and 1.8Kb (Fig. 5). No hybridizing band 
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is seen ri SK-BR*3 mRNA (this cell line overepresses pIBS^^^^). The existence of these 
multiple messages in MDA*MB-231 cells indicates either alternative splicing of the gene, 
various processing of the genes* primary transcrpt or the existance of a transcript of another 
homologous noessage. One of these messages may encode a soluble non-transmembrane 
5 bour)d fonn of HRG-a. Such nr>essages (Fig. 5) nfiay be used to produce cDNA encoding soluble 
non-transmembrane bound forms of HRG-a. 

Examples 

Wl erewth Stlmulatipn \n HeregvllfHx 

Several different breast cancer cell lines expressing the EGF receptor or the 

10 pISS^^^ receptor were tested for their sensitivity to growth inhbition or stimulation by ligand 
preparations. The cell lines tested were: SK-BR-3 (ATCC HTB 30). a cell line which 
overexpresses pl85HER2; mDA-MB-468 (ATCC HTB 132). a line which overexpresses the 
EGF receptor; and MCF-7 cells (ATCC HTB 22) which have a moderate level of p185"ER2 
expression. These cells were maintained in culture and passaged according to established cell 

15 culture techniques. The cells were grown in a 1:1 mixture of DMEM and F-12 media with 10% 
fetal bovine serum. For the assay, the stock cultures were treated with trypsin to detach the 
cells from the culture dish, and dispensed at a level of about 20000 cells/well in a ninety*six 
well microtiter plate. During the course of the growth assay they were maintained in media 
with 1% fetal bovine serum. The test samples were sterilized by filtration through 0.22 micron 

20 fitters and they were added to quadruplicate wells and the cells incubated for 3-5 days at 
37^0. At the end of the growth period, the media was aspirated from each well and the cells 
treated with crystal violet (Lewis. G. et ai, Cancer Research, 347:5382-5385 [1987]). The 
amount of crystal violet absorbance which is proportional to the number of cells in each well 
was measured on a Flow Plate Reader. Values from replicate wells for each test sample 

25 were averaged. Untreated wells on each dish served as controls. Results were expressed as 
percent of growth relative to the control cells. 

The purified HRG-a ligand was tested for activity in the cell growth assay and the 
results are presented in Figure 7. At a concentration of approximately 1 nM ligand, both of 
the cell lines expressing the p185^ER2 receptor (SK-BR-3 and MCF-7) showed stimulation of 

30 growth relative to the controls while the cell type (MDA-MB-468) expressing only the EGF 
receptor did not show an appreciable response. These results were consistent to those 
obtained from the autophosphorylation experiments with the various cell lines. These results 
established that HRG-a ligand is specific for the p185H^^^ receptor and does not show 
appreciable interaction with the EGF receptor at these concentrations. 

35 HRG does not compete with antibodies directed against the extra-cellular domain of 

p185^ER2^ but anti-p185MER2 Mabs 2C4 and 7F3 (which are antiproliferative in their own 
right) do antagonize HRG. 
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Example6 
CloninQ and Sfloueneinq of HerequBivBl 
The isolation of HRG-pl cDNA was accomplished by employing a hybridizing 
fra^ent of the DNA sequence encoding HRG-a to select additional cDNA sequences from 

5 the cDNA Gbrary constructed from human MDA-MB-231 cells. Clone Xherll.ldbl (heregulin- 
pi) was identified in a Xgtio oligo-dT primed cDNA library derived from MDA MB231 poly A* 
mRNA. Radioactively labelled synthetic DNA probes corresponding to the 5' and 3' ends of 
Xherie (HRQ-a) were employed in a hybridization reaction under high stringency conditions to 
isolate the Xherl 1 .Idbl clone. The DNA nucleotide sequence of the Xherl 1 .Idbl clone is shown 

10 in figure 8 (SEQ ID N0:9) HRG-^l amino acid sequence is homologous to HRG-a from its 
amino-tenninal end at position Asp 15 of HRG-a through the 3'end of HRG-a except at the 
positions described below. In addition, HRG-pl encoding DNA extends 189 base pairs longer 
than Xherie in the 3' direction and supplies a stop codon after Val 675. At nucleotide position 
247 of Xherl 1. Idbl there is a G substituted for A thereby resulting in the substitution of 

15 Gln(Q) in place of Arg(R) in HRG-pi as shown in the second line of Figure 9 (SEO ID N0:8 
andSEQIDN0:9). 

In the area of the EGF mot'if there are additional differences between HRG-a and 
HRG-pl. These differences are illustrated below in an expanded view of the homology 
between HRG-a and HRG-pl in the region of the EGF motif or the GFD (growth factor 
2D donfiain). The specific sequence shown corresponds to HRG-a amino acids 221-286 shown in 
figure 9. Asterisks indicate identical residues in the comparison below (SEQ ID N0:10 and 
SEQlDNOrll). 

HEREGULIN-a shlvkcaekektpcvnggec 
25 HEREGULIN-pl **• 

HEREGULIN-a fmvkdlsnpsrylckcqpgf 
HEREGULIN-pi pne* 

30 HEREGULIN-a tgarctenvpmkvqnqek - - 

HEREGULIN-Pl •*D**QNy*MASFyKHLGIE 

HEREGULIN-a ---AEELYQKR (-Transmembrane) 
HEREGULIN-pi fme******** (-Transmembrane) 

35 

Example? 
py presslon o f HereouBns in E. Coli 
HRG-a and HRG-pi have been expressed in E. coli using the DNA sequences of 
Figures 4 and 8 encoding heregulin under the control of the alkaline phosphatase promoter and 
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the STII leader sequence. In the mitial characterization of heregulin activity, the precise 
natural amino and carbox^ termini of the heregulin molecule were not precisely defined. 
However, after comparsion of heregulin to EGF and TGF-a sequences, we expected that 
shortened fomfis of heregulin starting around Ser 221 and ending around Glu 277 of figure 4 

5 may have btological activity. Analogous regions of all heregulins may be identified and 
expressed. One shortened form was constructed to have an N4emninai Asp residue followed 
by the residues 221 to 277 of HRG-a. Due to an accidental frame shift mutation following Glu 
277, HRG-a sequence was extended by 13 amino acids on the carboxy temninal end. Thus, 
the carboxy-temwial end was Glu 277 of HRG-a followed by the thirteen amino acid sequence 

10 RPNARLPPGVFYC (SEQ ID N0:20). 

Expression of this constnict was induced by growth of the cells in phosphate depleted 
medium for about 20 hours. Recombinant protein was purified by harvesting cell paste and 
resuspending m 10 mM Tris (pH8), homogenizing, incubating at 4OC. for 40 minutes and 
followed by centrifuging at 15 K rpm (Son/all). The supernatant was concentrated on a 30K 

15 ultrafiltration membrane (Amicon) and the filtrate was applied to a MonoQ column 

equilibrtated in 10 mM Tris pH8. The flow-through fractions from the MonoQ column were 
adjusted to 0.05% TFA (trifluoroacetic acid) and subjected to C4 reversed phase HPLC. 
Elution was with a gradient of 10-25% acetonitrile in 0.1% TFA/H2O. The solvent was 
removed by lyophilization and purified protein was resuspended in 0.1% bovine senjm albumin 

20 in phosphate buffered saline. Figure 1 0 depicts HER2 receptor autophophoiylation data with 
MCF-7 cells in response to the purified E. ooli-derived protein. This material demonstrated full 
biological activity with an ECsoof 0.8 nM. The purified material was also tested in the cell 
growth assays (Example 5) and was found to be a potent stimulator of cell growth. 

The recombinant expression vector for synthesis of HRG-pl was constructed in a 

25 manner similar to HRG-a. The expression vector contained DNA encoding HRG-pi amino 
acids from Ser207 through Leu273 (Figure 4). This DNA encoding HRG-pi was recombinantly 
spliced into the expression vector downstream from the alkaline phosphatase promoter and 
STII leader sequence. An additional servie residue was spliced on the carboxy temninus as a 
result of the recombinant constmction process. The expression vector encoding HRG-pl was 

30 used to transfomi £ coli and expressed in phosphate depleted medium. Induced £ coil were 
pelleted, resuspended in lOmM Tris (pH7.5) and sonicated. Cell debris was pelleted by 
centrifugation and the supernatant was filtered through a sterile filter before assay. The 
expression of HRG-pl was confinned by the detection of protein having the ability to 
stimulate autophosphoiylation of the HER2 receptor in MCF-7 cells. 

35 A similar expression vector was constructed as described for HRG-pl (above) with a 

C temiinal tyrosine residue instead of the serine residue. This vector was transformed into £ 
CO// and expressed as before. Purification of this recombinant protein was achieved as 
described for recombinant HRG-a. Mass speclrometric analysis revealed that the purified 
protein consisted of forms which were shorter than expected. Amino acid sequencing showed 
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that the protein had the desired N-terminal residue (Ser) but it was found by mass 
spectrometry to be truncated at the C terminus The ntajority (>80%) of the protein 
consisted of a fomn 51 amino acids long with a C termnai methionine (MET 271) (SEQ ID 
N0:9). A small amount of a shorter form (49 residues) truncated at VAL 269 was also 
detected. However, both the shortened fonns showed full biological activity in the HER2 
receptor autophosphorylation assay. 

ISOLATION OF HEREGULIN Rg and R3 VARIANTS 

Heregulin-p2 and -^3 variants were isolated in order to obtain cDNA clones that 
extend further in the 5' direction. A specifically primed cDNA iS)rary was constmcted in 
XgllO by employing the chemically synthesized antisense primer 
3' CCTTCCCGTTCTTCTTCCTCGCTCC (SEQ ID N0:21). This primer is located 
between nucleotides 167-190 in the sequence of Xher16 (figure 4 ). The isolation of clone 
X5'her13 (not to be confused with XherlS) was achieved by hybridiznig a synthetic DNA 
probe corresponding to the 5' end of Xher16 under high stringency conditions with the 
specifically primed cDNA library. The nucleotide sequence of >5'hen3 is shown in figure 1 1 
(SEQ 10 UO-22). The 496 base pair nucleotide sequence of X5'her13 is homologous to the 
sequence of Xhen6 between nucleotides 309-496 of XStierlS and 3-190 of Xhen6. X5'hen3 
extends by 102 amino acids the open reading frame of Xher16. 

The isolation of variant heregulin-^ forms was accomplished by probing a newly 
prepared oligodT primed XgtlO MDA-MB-231 mRNA-derived cDNA library with synthetic 
probes corresponding to the 5' end of X5'hen3 and the cysteine rich EGF-like region of Xher16. 
Three variants of heregulin-p were identified, isolated and sequenced. The amino acid 
homologies between all heregulins is shown in figure 15 (SEQ ID NOS:26-30). 

HRG polypeptides Xher76 (heregulin-j32) (SEQ ID NO:23). Xher78 (heregulin-p3) 
(SEQ ID N0:24) and >Jier84 (heregulin p2-like) (SEQ ID N025) are considered variants of 
Xhen 1 .1 d)l (heregulin-pi ) because although the deduced amino acid sequence is identical 
between cysteine 1 and cysteine 6 of the EGF-lflce motif their sequences diverge before the 
predicted transmembrane domain which probably begins with amino acid 248 in Xherll.ldbl. 
The nucleotide sequences and deduced amino acid sequences of Xher76, Xher78 and Xher84 
are shown in figures 12, 13 and 14. 

The variants each contain a TGA stop codon 148 bases 5' of the first methionine 
codon in their sequences. Therefore the ATG codon at nucleotide position 135-137 of Xher16 
and the corresponding ATG in the other heregulin clones may be defined as the initiating 
methionine (amino acid 1). Clones Xhen 1 .Idbl, Xher76, Xhei84 and Xher78 all encode 
glutamine at amino acid 38 (Figure 15) whereas done hen 6 encodes arginine (Figure 4, 
position 82). 

The deduced amino acid sequence of Xher76 (heregulin-pi) reveals a full-length clone 
encoding 637 amino acids, tt shares an identical deduced amino acid sequence as Xhen 1 .idbl 
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except that residues correspond'ng to amino acids 232-239 of Xheri 1 .Idbl have been deleted. 

The deduced amino acid sequence of Xher84 shows that it posesses the same amino acid 

sequence as Xher76 from the initialing methionine (ammo acid 1 , Figure 15) through the EGF- 

Dice area and transmembrane domain. However, Xher84 comes to an early stop codon at 
5 arginine 421 (Xher84 numbering). Thereafter the 3' untranslated sequence diverges. The 

deduced amino acid sequence of Xher78 (heregulin-^^ is homologous with heregulins-p^ and 
through amino acid 230 where the sequence diverges for eleven amino acids then 

tenninates. Thus heregulin-p^ has no transmembrane region. The 3' untranslated sequence is 

not homologous to the other clones. 
10 ExampleQ 

PYPRPSSION OF HgRgQULIfJ R FORMS 
In order to express hereguiin-p fomns in mammalian cells, full-length cDNA nucleotide 
sequences from ?iher76 (heregulin-P2) or Xher84 were subcloned into the mammalian 
expression vector pRKS.I. This vector is a derivative of pRK5 that contains a 
15 cytomegalovirus promoter followed by a 5' intron, a cloning polylinker and an SV40 early 
polyadenylation signal. C0S7, monkey or human kklney 293 cells were transfected and 
conditioned medium was assayed in the MCF-7 cell p185/her2 autophosphorylation assay. A 
posKive response confirmed the expression of the cDNA's from Xher76 (heregulin-p2) and 
Xher84 (heregurm-p3). 

20 Supematants from a large scale transient expression experiment were concentrated 

on a YM10 membrane (Amicon) and applied to a heparin Sepharose column as described in 
Example 1. Activity (tyrosine phosphorylation assay) was detected in the 0.6M NaCI elution 
pool and was further purifed on a polyaspartic acid column, as previously described By SDS 
gel analysis and activity assays, the active fractions of this column were highly purified and 

25 contained a single band of protein with an apparent molecular weight of 45,000 daltons. Thus, 
the expressed protein has chromatographic and structural properties which are very similar to 
those of the native form of hereguiin originally isolated from the MDA 231 cells. Small scale 
transient expression experiments with constmcts made from Xher84 cDNA also revealed 
comparable levels of activity in the cell supematants from this variant fomi. The expression 

30 of the transmembrane-minus variant. heregulin-p3, is currently under investigation. 

Example 10 

proHRG-a and proHRQ-Pi cDNAs were spliced into Epstein Ban virus derived 
expression vectors containing a cytomegalovirus promoter. rHRQs were purified (essentially 
as described in Example 2) from the senim free conditioned medium of stably transfected 
35 CEN4 cells [human kidney 293 cells (ATCC No. 1573) expressing the Epstein Ban- vims 
EBNA-1 transactivator. In other experiments full length proHRG-a, -Pi and -P2 transient 
expression constmcts provided pi 85"^^ phosphorylation activity in the conditioned medium of 
transfected C0S7 monkey kklney cells. However, similar constmcts of full length proHRG-Pa 
failed to yieW activity suggesting that the hydrophobic domain missing in proHRG-Pa but 
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present in the other proHRGs is necessary for secretion of mature protein. Truncated 
versions of proHRG-a (63 amino acids, serin 177 to tyrosine 239) and proHRG-pi (68 amino 
acids, serine 177 to tyrosine 241) each encoding the GFD structural unit and immediate 
flanking regions were also expressed in £ colt homologous tmncated versions of HRG-33 are 

5 expected to be expressed as active molecules. These truncated proteins were purified from 
the periplasmic space and culture broth of £ colL transfomned with expression vectors 
designed to secrete recombinant proteins (C.N. Change. M. Rey, B. Bochenr. H. Heyneker, G. 
Gray. Gene, 55:189 [1987]). These proteins also stimulated tyrosine phosphorylation of 
PI85HER2 but rK)t PIO7HER1 . indicating that the bfologteal activity of HRG reskJes in the EGF- 

10 like domain of the protein and that carbohydrate moieties are not essential for activity in this 
assay. The NTD does not inhibit or suppress this activity. 

Example 11 

Various human tissues were examined for the presence of HRG mRNA. Transcripts 
were found in breast, ovary, testis, prostate, heart, skeletal muscle, lung, liver, kidney. 

15 salivary gland, small intestine, and spleen but not in stomach, pancreas, utems or placenta. 
While most of these tissues display the same three classes of transcripts as the MDA-MB-231 
cells (6.6 kb. 2.5 kb and 1 .8 kb). only the 6.6 kb message was observed for in heart and 
skeletal muscle. In brain a single transcript of 2.2 kb is observed and in testis the 6.6 kb 
transcript appears along with others of 2.2 kb. 1.9 kb and 1.5 kb. The tissue specific 

2D expression pattern observed for HRG differs from that of p185HE''2; for example, adult liver, 
spleen, and brain contain HRG but not p185H^l^2 transcripts whereas stomach, pancreas, 
uteojs and placenta contain p185MER2 transcripts but lack HRG mRNA. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Genentech, Inc. 
(ii) TITLE OF INVENTION: Structure, Production and Use of Heregulin 
(iii) NUMBER OF SEQUENCES: 30 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genentech, Inc. 

(B) STREET: 460 Point San Bruno Blvd 

(C) CITY: South San Francisco 
15 (D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94080 

(V) COMPUTER READABLE FORM: 
20 (A) MEDIUM TYPE: 5.25 inch, 360 Kb floppy disk 

(B) COMPUTER: IBM PC coitpatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: patin (Genentech) 

25 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 21-May-1992 

(C) CLASSIFICATION: 

30 (vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: ll-May-1992 

(vii) PRIOR APPLICATION DATA: 
35 (A) APPLICATION NUMBER: 07/847743 

(B) FILING DATE: 06-Mar-1992 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 07/705256 
40 (B) FILING DATE: 24-May-1991 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 07/765212 

(B) FILING DATE: 25-Sep-1991 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 07/790801 

(B) FILING DATE: 08-NOV-1991 



50 (viii) ATTORNEY /AGENT INFORMATION: 
(A) NAME: Hensley, Max D. 
(E) REGISTRATION NUMBER: 27,043 
(C) REFERENCE /DOCKET NUMBER: 712P4 

55 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415/266-1994 

(B) TELEFAX: 415/952-9881 

(C) TELEX: 910/371-7168 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 bases 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



CNCAAT 6 



15 



(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 bases 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



AATAAA 6 



30 



45 



(2) INFORMATION FOR SEQ ID N0:3; 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 amino acids 
35 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

40 Ala Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Glv Giy Glu Xaa 
1 5 10 . 15 



Phe Met Val Lys Asp Leu Xaa Asn Pro 
20 24 

(2) INFORMATION FOR SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 amino acids 
50 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

55 Xaa Glu Xaa Lys Glu Gly Arg Gly Lys Gly Lys Gly Lvs Lys Lys 
1 " 5 ' 10 ' ' 15 



Glu Xaa Gly Xaa Gly Lys 
20 21 

eo 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 amino acids 
5 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

10 Ala Glu Lys Glu Lys Thr Phe Xaa Val Asn Gly Gly Glu 
1 5 10 13 

(2) INFORMATION FOR SEQ ID NO: 6: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 

GCTGAGAAGG AGAAGACCTT CTGTCGTGAA TCGGACGGCG AG 42 

(2) INFORMATION FOR SEQ ID N0:7: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2199 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



25 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7; 



GG GAC AAA CTT TTC CCA AAC CCG ATC CGA GCC CTT GGA 38 
40 Asp Lys Leu Phe Pro Asn Pro lie Arg Ala Leu Gly 

15 10 

CCA AAC TCG CCT GCG CCG AGA GCC GTC CGC GTA GAG CGC 77 
Pro Asn Ser Pro Ala Pro Arg Ala Val Arg Val Glu Arg 
45 15 20 25 

TCC GTC TCC GGC GAG ATG TCC GAG CGC AAA GAA GGC AGA 116 
Ser Val Ser Gly Glu Met Ser Glu Arg Lys Glu Gly Arg 
30 35 

50 

GGC AAA GGG AAG GGC AAG AAG AAG GAG CGA GGC TCC GGC 155 
Gly Lys Gly Lys Gly Lys Lys Lys Glu Arg Gly Ser Gly 
40 45 50 

55 AAG AAG CCG GAG TCC GCG GCG GGC AGC CAG AGC CCA GCC 194 

Lys Lvs Pro Glu Ser Ala Ala Gly Ser Gin Ser Pro Ala 
55 €0 

TTG CCT CCC CAA TTG AAA GAG ATG AAA AGC CAG GAA TCG 233 
60 Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin Glu Ser 

65 70 75 
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GCT GCA GGT TCC AAA CTA GTC CTT CGG TGT GAA ACC AGT 272 
Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser 
80 85 90 

5 TCT GAA TAC TCC TCT CTC AGA TTC AAG TGG TTC AAG AAT 311 

Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn 
95 100 

GGG AAT GAA TTG AAT CGA AAA AAC AAA CCA CAA AAT ATC 350 
10 Gly Asn Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn lie 

105 110 115 

AAG ATA CAA AAA AAG CCA GGG AAG TCA GAA CTT CGC ATT 389 
Lys He Gin Lys Lys Pro Gly Lys Ser Glu Leu Arg He 
15 120 125 

AAC AAA GCA TCA CTG GCT GAT TCT GGA GAG TAT ATG TGC 428 

Asn Lys Ala Ser Leu Ala Asp Ser Gly Glu Tyr Met Cys 
130 135 140 

20 

AAA GTG ATC AGC AAA TTA GGA AAT GAC AGT GCC TCT GCC 467 

Lys Val He Ser Lys Leu Gly Asn Asp Ser Ala Ser Ala 
145 150 155 

25 AAT ATC ACC ATC GTG GAA TCA AAC GAG ATC ATC ACT GGT 506 

Asn He Thr He Val Glu Ser Asn Glu He He Thr Gly 
160 165 

ATG CCA GCC TCA ACT GAA GGA GCA TAT GTG TCT TCA GAG 545 
30 Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser Glu 

170 175 180 

TCT CCC ATT AGA ATA TCA GTA TCC ACA GAA GGA GCA AAT 584 
Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn 
35 185 190 

ACT TCT TCA TCT ACA TCT ACA TCC ACC ACT GGG ACA AGC 623 

Thr Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser 
195 200 205 

40 

CAT CTT GTA AAA TGT GCG GAG AAG GAG AAA ACT TTC TGT 662 

His Leu Val Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys 

210 215 220 

45 GTG AAT GGA GGG GAG TGC TTC ATG GTG AAA GAC CTT TCA 701 

Val Asn Gly Gly Glu Cys Phe Met Val Lys Asp Leu Ser 
225 230 

AAC CCC TCG AGA TAC TTG TGC AAG TGC CCA AAT GAG TTT 740 
50 Asn Pro Ser Arg Tyr Leu Cys Lys Cys Pro Asn Glu Phe 

235 240 245 

ACT GGT GAT CGC TGC CAA AAC TAC GTA ATG GCC AGC TTC 779 
Thr Gly Asp Arg Cys Gin Asn lyr Val Met Ala Ser Phe 
55 250 255 

TAC AAG CAT CTT GGG ATT GAA TTT ATG GAG GCG GAG GAG 818 
Tyr Lys His Leu Gly He Glu Phe Met Glu Ala Glu Glu 
260 265 270 



CTG TAC CAG AAG AGA GTG CTG ACC ATA ACC GGC ATC TGC BS' 
Leu T\'r Gin Lys Arg Vai Leu Thr He Thr Gly He C^'s 
275 280 28E 
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ATC GCC CTC CTT GTG GTC GGC ATC ATG TGT GTG GTG GCC 896 
lie Ala Leu Leu Val Val Gly He Met Cys Val Val Ala 
290 295 

TAC TGC AAA ACC AAG AAA CAG CGG AAA AAG CTG CAT GAC 935 
Tyr Cys Lys Thr Lys Lys Gin Arg Lys Lys Leu His Asp 
300 305 310 

CGT CTT CGG CAG AGC CTT CGG TCT GAA CGA AAC AAT ATG 974 
Arg Leu Arg Gin Ser Leu Arg Ser Glu Arg Asn Asn Met 
315 320 

ATG AAC ATT GCC AAT GGG CCT CAC CAT CCT AAC CCA CCC 1013 
Met Asn He Ala Asn Gly Pro His His Pro Asn Pro Pro 
325 330 335 

CCC GAG AAT GTC CAG CTG GTG AAT CAA TAC GTA TCT AAA 1052 
Pro Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Ly 
340 345 35-< 

AAC GTC ATC TCC AGT GAG CAT ATT GTT GAG AGA GAA GCA 1091 
Asn Val He Ser Ser Glu His He Val Glu Arg Glu Ala 
355 360 

GAG ACA TCC TTT TCC ACC AGT CAC TAT ACT TCC ACA GCC 113 0 
Glu Thr Ser Phe Ser Thr Ser His Tyr Thr Ser Thr Ala 
365 370 375 

CAT CAC TCC ACT ACT GTC ACC CAG ACT CCT AGC CAC AGC 1169 
His His Ser Thr Thr Val Thr Gin Thr Pro Ser His Ser 
380 385 

TGG AGC AAC GGA CAC ACT GAA AGC ATC CTT TCC GAA AGC 1208 
Trp Ser Asn Gly His Thr Glu Ser He Leu Ser Glu Ser 
390 395 400 

CAC TCT GTA ATC GTG ATG TCA TCC GTA GAA AAC AGT AGG 1247 
His Ser Val He Val Met Ser Ser Val Glu Asn Ser Arg 
405 410 415 

CAC AGC AGC CCA ACT GGG GGC CCA AGA GGA CGT CTT AAT 1286 
His Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn 
420 425 

GGC ACA GGA GGC CCT CGT GAA TGT AAC AGC TTC CTC AGG 1325 
Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg 
430 435 440 

CAT GCC AGA GAA ACC CCT GAT TCC TAC CGA GAC TCT CCT 1364 
His Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro 
445 450 

CAT AGT GAA AGG TAT GTG TCA GCC ATG ACC ACC CCG GCT 1403 
His Ser Glu Arg T\'r Val Ser Ala Met Thr Thr Pro Ala 
455 460 465 

CGT ATG TCA CCT GTA GAT TTC CAC ACG CCA AGC TCC CCC 1442 
Ara Met Ser Pro Val Asp Phe His Thr Pro Ser Ser Pro 
470 475 480 

AAA TCG CCC CCT TCG GAA ATG TCT CCA CCC GTG TCC AGC 1481 
Lys Ser Pro Pro Ser Glu Met Ser Pro Pro Val Ser Ser 
485 490 
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ATG ACG GTG TCC ATG CCT TCC ATG GCG GTC AGC CCC TTC 1520 
Met Thr Val Ser Met Pro Ser Met Ala Val Ser Pro Phe 
495 500 505 

ATG GAA GAA GAG AGA CCT CTA CTT CTC GTG ACA CCA CCA 1559 
Met Glu Glu Glu Arg Pro Leu Leu Leu Val Thr Pro Pro 
510 515 

AGG CTG CGG GAG AAG AAG TTT GAC CAT CAC CCT CAG CAG 1598 
Arg Leu Arg Glu Lys Lys Phe Asp His His Pro Gin Gin 
520 525 530 

TTC AGC TCC TTC CAC CAC AAC CCC GCG CAT GAC AGT AAC 1637 
Phe Ser Ser Phe His His Asn Pro Ala His Asp Ser Asn 
535 540 545 

AGC CTC CCT GCT AGC CCC TTG AGG ATA GTG GAG GAT GAG 1676 
Ser Leu Pro Ala Ser Pro Leu Arg lie Val Glu Asp GIl 
550 555 * 

GAG TAT GAA ACG ACC CAA GAG TAC GAG CCA GCC CAA GAG 1715 
Glu Tyr Glu Thr Thr Gin Glu Tyr Glu Pro Ala Gin Glu 
560 565 570 

CCT GTT AAG AAA CTC GCC AAT AGC CGG CGG GCC AAA AGA 1754 
Pro Val Lys Lys Leu Ala Asn Ser Arg Arg Ala Lys Arg 
575 580 

ACC AAG CCC AAT GGC CAC ATT GCT AAC AGA TTG GAA GTG 1793 
Thr Lys Pro Asn Gly His lie Ala Asn Arg Leu Glu Val 
585 590 595 

GAC AGC AAC ACA AGC TCC CAG AGC AGT AAC TCA GAG AGT 1832 
Asp Ser Asn Thr Ser Ser Gin Ser Ser Asn Ser Glu Ser 
600 605 610 

GAA ACA GAA GAT GAA AGA GTA GGT GAA GAT ACG CCT TTC 1871 
Glu Thr Glu Asp Glu Arg Val Gly Glu Asp Thr Pro Phe 
615 620 

CTG GGC ATA CAG AAC CCC CTG GCA GCC AGT CTT GAG GCA 1910 
Leu Gly lie Gin Asn Pro Leu Ala Ala Ser Leu Glu Ala 
625 630 635 

ACA CCT GCC TTC CGC CTG GCT GAC AGC AGG ACT AAC CCA 1949 
Thr Pro Ala Phe Arg Leu Ala Asp Ser Arg Thr Asn Pro 
640 645 

GCA GGC CGC TTC TCG ACA CAG GAA GAA ATC CAG GCC AGG 1988 
Ala Gly Arg Phe Ser Thr Gin Glu Glu lie Gin Ala Arg 
650 655 660 

CTG TCT AGT GTA ATT GCT AAC CAA GAC CCT ATT GCT GTA TA 2025 
Leu Ser Ser Val lie Ala Asn Gin Asp Pro He Ala Val 
665 670 675 

A AACCTAAATA AACACATAGA TTCACCTGTA AAACTTTATT 2070 



TTATATAATA AAGTATTCCA CCTTAAATTA AACAATTTAT TTTATTTTAG 2120 



CAGTTCTGCA AATAGAAAAC AGGAAAAAAA CTTTTATAAA TTAAATATAT 217C- 
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GTATGTAAAA ATGAAAAAAA AAAAAAAAA 2199 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 669 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Ala Arg Ala Pro Gin Arg Gly Arg Ser Leu Ser Pro Ser Arg Asp 
15 10 15 

Lys Leu Phe Pro Asn Pro lie Arg Ala Leu Gly Pro Asn Ser Pro 
20 25 30 

Ala Pro Arg Ala Val Arg Val Glu Arg Ser Val Ser Gly Glu Met 
35 40 45 

Ser Glu Arg Lys Glu Gly Arg Gly Lvs Gly Lys Gly Lys Lys Lys 
50 55 60 

Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser Gin 
65 70 75 

Ser Pro Ala Leu Pro Pro Arg Leu Lys Glu Met Lys Ser Gin Glu 
80 85 90 

Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser Ser 
95 100 105 

Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn Glu 
110 115 120 

Leu Asn Arg Lys Asn Lys Pro Gin Asn lie Lys lie Gin Lys Lys 
125 130 135 

Pro Gly Lys Ser Glu Leu Arg lie Asn Lys Ala Ser Leu Ala Asp 
140 145 150 

Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu Gly Asn Asp 
155 160 165 

Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He He 
170 175 180 

Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser Glu 
185 190 195 

Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr Ser 
200 205 210 

Ser Ser Thr Ser Thr Ser Thr Thr Giy Thr Ser His Leu Val Lys 
215 220 225 

Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu Cys 



230 



235 



240 



Phe Met Vai Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu C^'s Lys 
245 250 255 
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Cys Gin Pro Gly Phe Thr Gly Ala Arg Cys Thr Glu Asn Val Pro 
260 265 270 

Met Lys Val Gin Asn Gin Glu Lys Ala Glu Glu Leu Tyr Gin Lys 
275 280 285 

Arg Val Leu Thr lie Thr Gly lie Cys He Ala Leu Leu Val Val 
250 295 300 

Gly He Met Cys Val Val Ala Tyr Cys Lys Thr Lys Lys Gin Arg 
305 310 315 



Lys Lys Leu His Asp Arg Leu Arg Gin Ser Leu Arg Ser Glu Arg 
15 320 325 330 



2D 



Asn Asn Met Met Asn He Ala Asn Gly Pro His His Pro Asn Pro 
335 340 345 

Pro Pro Glu Asn V&l Gin Leu Val Asn Gin Tyr Val Ser Lys Asn 
350 355 360 



25 



Val He Ser Ser Glu His He Val Glu Arg Glu Ala Glu Thr Ser 

3€5 370 375 

Phe Ser Thr Ser Kis Tyr Thr Ser Thr Ala His His Ser Thr Thr 

380 385 390 



Val Thr Gin Thr Pro Ser His Ser Trp Ser Asn Gly His Thr Glu 
30 395 400 405 



35 



Ser He Leu Ser Glu Ser His Ser Val He Val Met Ser Ser Val 
410 415 420 

Glu Asn Ser Arg Kis Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg 
425 430 435 



40 



Leu Asn Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg 

440 445 450 

His Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His Ser 

4=5 460 465 



Glu Arg Tyr Val Ser Ala Met Thr Thr Pro Ala Arg Met Ser Pro 
45 4':0 475 480 



50 



Val Asp Phe His Thr Pro Ser Ser Pro Lys Ser Pro Pro Ser Glu 

435 490 495 

Met Ser Pro Pro V=l Ser Ser Met Thr Val Ser Met Pro Ser Met 

500 505 510 



55 



Ala Val Ser Pro rhe Met Glu Glu Glu Arg Pro Leu Leu Leu Val 



520 



525 



Thr Pro Pro Arg Leu Arg Glu Lys Lys Phe Asp His His Pro Gir. 

53 0 535 540 



Gin Phe Ser Ser ?he His His Asn Pro Ala His Asp Ser Asn Ser 
60 5;5 550 555 



Leu Pro Ala Ser rrz Leu Arg He Val Glu Asp Glu Glu Tyr Gl\ 
5£: 565 57- 
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Thr Thr Gin Glu Tyr Glu Pro Ala Gin Glu Pro Val Lys Lys Leu 

575 580 585 

Ala Asn Ser Arg Arg Ala Lys Arg Thr Lys Pro Asn Gly His He 

5 590 595 600 

Ala Asn Arg Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser Ser 

605 610 615 

10 Asn Ser Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp Thr 

620 625 630 



15 



30 



45 



60 



Pro Phe Leu Gly He Gin Asn Pro I^u Ala Ala Ser Leu Glu Ala 
635 640 645 

Thr Pro Ala Phe Arg Leu Ala Asp Ser Arg Thr Asn Pro Ala Gly 
650 655 660 



Arg Phe Ser Thr Gin Glu Glu He Gin 
20 665 669 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 732 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Asp Lys Leu Phe Pro Asn Pro He Arg Ala Leu Gly Pro Asn Ser 
15 10 15 



Pro Ala Pro Arg Ala Val Arg Val Glu Arg Ser Val Ser Gly Glu 
35 20 25 30 

Met Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys 
35 40 45 

40 Lys Glu Arg Glv Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser 

50 55 60 



Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin 
65 70 75 

Glu Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser 
80 85 90 



Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn 
50 95 100 105 

Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn He Lys He Gin Lys 
110 115 120 

56 Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 

125 130 135 



Asp Ser Gly Glu Tyr Met C-ys Lys Val He Ser Lys Leu Gly Asn 

140 145 150 

Asp Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He 

155 160 165 

He Thr Giy Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser 
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170 



175 



180 



5 
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15 

20 
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Glu Ser Pro lie Arg lie Ser Val Ser Thr Glu Gly Ala Asn Thr 
185 190 195 

Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val 
200 205 210 

Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu 
215 220 225 

Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu Cys 
230 235 240 

Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin Asn Tyr Val 
245 250 255 

Met Ala Ser Phe Tyr Lys His Leu Gly He Glu Phe Met Glu Ala 
260 265 270 

Glu Glu Leu Tyr Gin Lys Arg Val Leu Thr He Thr Gly He Cys 
275 280 285 

He Ala Leu Leu Val Val Gly He Met Cys Val Val Ala Tyr Cys 
290 295 300 

Lys Thr Lys Lys Gin Arg Lys Lys Leu His Asp Arg Leu Arg Gin 
305 310 315 

Ser Leu Arg Ser Glu Arg Asn Asn Met Met Asn He Ala Asn Gly 
320 325 330 

Pro His His Pro Asn Pro Pro Pro Glu Asn Val Gin Leu Val Asn 
335 340 345 

Gin Tyr Val Ser Lys Asn Val He Ser Ser Glu His He Val Glu 
350 355 360 

Arg Glu Ala Glu Thr Ser Phe Ser Thr Ser His Tyr Thr Ser Thr 
365 370 375 

Ala His His Ser Thr Thr Val Thr Gin Thr Pro Ser His Ser Trp 
380 385 390 

Ser Asn Gly His Thr Glu Ser He Leu Ser Glu Ser His Ser Val 
395 400 405 

He Val Met Ser Ser Val Glu Asn Ser Arg His Ser Ser Pro Thr 
410 415 420 

Gly Gly Pro Arg Gly Arg Leu Asn Gly Thr Gly Gly Pro Ara Glu 
425 430 435 

Cys Asn Ser Phe Leu Arg His Ala Arg Glu Thr Pro Asp Ser Tyr 
440 445 450 

Ara Asp Ser Pro His Ser Glu Arg Tyr Val Ser Ala Met Thr Thr 
455 46C- 465 

Pro Ala Arg Met Ser Pro Val Asp Phe His Thr Pro Ser Ser Pre 
470 47= 480 

Lys Ser Pro Pro Ser Glu Met Ser Pro Pre Val Ser Ser Met Thr 



485 



49C 
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Val Ser Met Pro Ser Met Ala Val Ser Pro Phe Met Glu Glu Glu 

500 505 510 

Arg Pro Leu Leu Leu Val Thr Pro Pro Arg Leu Arg Glu Lys Lys 

515 520 525 

Phe Asp His His Pro Gin Gin Phe Ser Ser Phe His His Asn Pro 

530 535 540 

Ala His Asp Ser Asn Ser Leu Pro Ala Ser Pro Leu Arg lie Val 

545 550 555 



Glu Asp Glu Glu Tyr Glu Thr Thr Gin Glu Tyr Glu Pro Ala Gin 
15 560 565 570 

Glu Pro Val Lys Lys Leu Ala Asn Ser Arg Arg Ala Lys Arg Thr 
575 580 585 

20 Lys Pro Asn Gly His lie Ala Asn Arg Leu Glu Val Asp Ser Asn 

590 595 600 

Thr Ser Ser Gin Ser Ser Asn Ser Glu Ser Glu Thr Glu Asp Glu 
605 610 615 

25 

Arg Val Gly Glu Asp Thr Pro Phe Leu Gly He Gin Asn Pro Leu 
620 625 630 

Ala Ala Ser Leu Glu Ala Thr Pro Ala Phe Arg Leu Ala Asp Ser 
30 635 640 645 

Arg Thr Asn Pro Ala Gly Arg Phe Ser Thr Gin Glu Glu He Gin 
650 655 660 

35 Ala Arg Leu Ser Ser Val He Ala Asn Gin Asp Pro He Ala Val 

665 670 675 



Xaa Asn Leu Asn Lys His He Asp Ser Pro Val Lys Leu Tyr Phe 

680 685 690 

He Xaa Xaa Ser He Pro Pro Xaa He Lys Gin Phe He Leu Phe 

695 700 705 



Xaa Gin Phe Cys Lys Xaa Lys Thr Gly Lys Lys Leu Leu Xaa He 
45 710 715 720 

Lys Tyr Met Tyr Val Lys Met Lys Lys Lys Lys Lys 
725 730 732 

50 (2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TypE: aroino acid 
55 (D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Ser His Leu Val Lvs Cvs Ala Glu Lys Glu Lys Thr Phe C\'s Val 
eO 1 'S ' 10 15 

Asn Gly Glv Glu Cvs Phe Met Val Lys Asp Leu Ser Asn Pro Ser 

2-: 25 3C 
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Arg Tyr Leu Cys Lys Cys Gin Pro Gly Phe Thr Gly Ala Arg Cys 
35 40 45 

Thr Glu Asn Val Pro Met Lys Val Gin Asn Gin Glu Lys Ala Glu 
5 50 55 60 

Glu Leu Tyr Gin Lys Arg 
65 66 

10 (2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 71 eunino acids 

(B) TVPE: eunino acid 
15 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Ser His Leu Val Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val 
20 1 5 10 15 

Asn Gly Gly Glu Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser 
20 25 30 

25 Arg Tyr Leu Cys Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys 

35 40 45 

Gin Asn Tyr Val Met Ala Ser Phe Tyr Lys His Leu Gly lie Glu 
50 55 60 

30 

Phe Met Glu Ala Glu Glu Leu Tyr Gin Lys Arg 
65 70 71 

(2) INFORMATION FOR SEQ ID NO: 12: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2010 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
40 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



45 GGGCGCGAGC GCCTCAGCGC GGCCGCTCGC TCTCCCCCTC GAGGGACAAA 50 



CTTTTCCCAA ACCCGATCCG AGCCCTTGGA CCAAACTCGC CTGCGCCGAG 100 

50 

AGCCGTCCGC GTAGAGCGCT CCGTCTCCGG CGAGATGTCC GAGCGCAAAG 150 



AAGGCAGAGG CAAAGGGAAG GGCAAGAAGA AGGAGCGAGG CTCCGGC AAG 200 

55 

AAGCCGGAGT CCGCGGCGGG CAGCCAGAGC CCAGCCTTGC CTCCCCGATT 250 



60 GAAAGAGAT3 AAAAGCCAGG AATCGGCTGC AGGTTCCAAA CTAGTCCTTC 30C 



GGTGTGAAAC CAGTTCTGAA TACTCCTCTC TCAGATTCAA GTGGTTCAAG 3 5C 
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AATGGGAATG AATTGAATCG AAAAAACAAA CCACAAAATA TCAAGATACA 400 
5 AAAAAAGCCA GGGAAGTCAG AACTTCGCAT TAACAAAGCA TCACTGGCTG 450 

ATTCTGGAGA GTATATGTGC AAAGTGATCA GCAAATTAGG AAATGACAGT 500 
GCCTCTGCCA ATATCACCAT CGTGGAATCA AACGAGATCA TCACTGGTAT 550 
GCCAGCCTCA ACTGAAGGAG CATATGTGTC TTCAGAGTCT CCCATTAGAA 600 
TATCAGTATC CACAGAAGGA GCAAATACTT CTTCATCTAC ATCTACATCC 650 
2D ACCACTGGGA CAAGCCATCT TGTAAAATGT GCGGAGAAGG AGAAAACTTT 700 

CTGTGTGAAT GGAGGGGAGT GCTTCATGGT GAAAGACCTT TCAAACCCCT 750 



10 



15 



25 



30 



40 



45 



55 



60 



CGAGATACTT GTGCAAGTGC CAACCTGGAT TCACTGGAGC AAGATGTACT 800 



GAGAATGTGC CCATGAAAGT CCAAAACCAA GAAAAGGCGG AGGAGCTGTA 850 



CCAGAAGAGA GTGCTGACCA TAACCGGCAT CTGCATCGCC CTCCTTGTGG 900 



35 TCGGCATCAT GTGTGTGGTG GCCTACTGCA AAACCAAGAA ACAGCGGAAA 950 



AAGCTGCATG ACCGTCTTCG GCAGAGCCTT CGGTCTGAAC GAAACAATAT 1000 



GATGAACATT GCCAATGGGC CTCACCATCC TAACCCACCC CCCGAGAATG 1050 



TCCAGCTGGT GAATCAATAC GTATCTAAAA ACGTCATCTC CAGTGAGCAT 1100 
ATTGTTGAGA GAGAAGCAGA GACATCCTTT TCCACCAGTC ACTATACTTC 1150 
50 CACAGCCCAT CACTCCACTA CTGTCACCCA GACTCCTAGC CACAGCTGGA 1200 

GCAACGGACA CACTGAAAGC ATCCTTTCCG AAAGCCACTC TGTAATCGTG 1250 
ATGTCATCCG TAGAAAACAG TAGGCACAGC AGCCCAACTG GGGGCCCAAG 1300 
AGGACGTCTT AATGGCACAG GAGGCCCTCG TGAATGTAAC AGCTTCCTCA 1350 
GGCATGCCAG AGAAACCCCT GATTCCTACC GAGACTCTCC TCATAGTGAA 140C 
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AGGTATGTGT CAGCCATGAC CACCCCGGCT CGTATGTCAC CTGTAGATTT 1450 
CCACACGCCA AGCTCCCCCA AATCGCCCCC TTCGGAAATG TCTCCACCCG 1500 

5 

TGTCCAGCAT GACGGTGTCC ATGCCTTCCA TGGCGGTCAG CCCCTTCATG 1550 
10 GAAGAAGAGA GACCTCTACT TCTCGTGACA CCACCAAGGC TGCGGGAGAA 1600 

GAAGTTTGAC CATCACCCTC AGCAGTTCAG CTCCTTCCAC CACAACCCCG 1650 
CGCATGACAG TAACAGCCTC CCTGCTAGCC CCTTGAGGAT AGTGGAGGAT 1700 
GAGGAGTATG AAACGACCCA AGAGTACGAG CCAGCCCAAG AGCCTGTTAA 1750 
GAAACTCGCC AATAGCCGGC GGGCCAAAAG AACCAAGCCC AATGGCCACA 1800 
25 TTGCTAACAG ATTGGAAGTG GACAGCAACA CAAGCTCCCA GAGCAGTAAC 1850 

TCAGAGAGTG AAACAGAAGA TGAAAGAGTA GGTGAAGATA CGCCTTTCCT 1900 
GGGCATACAG AACGCCCTGG CAGCCAGTCT TGAGGCAACA CCTGCCTTCC 1950 
GCCTGGCTGA CAGCAGGACT AACCCAGCAG GCCGCTTCTC GACACAGGAA 2000 
GAAATCCAGG 2010 



15 



2D 



30 



35 



40 



55 



(2) INFORMATION FOR SEQ ID N0:13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 669 amino acids 
45 (B) TYPE: aroino acid 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

50 Ala Arg Ala Pro Gin Arg Gly Arg Ser Leu Ser Pro Ser Arg Asp 
15 10 15 



Lys Leu Phe Pro Asn Pro lie Arg Ala Leu Gly Pro Asn Ser Pro 

20 25 30 

Ala Pro Ara Ala Vai Arg Val Glu Ara Ser Val Ser Gly Glu Met 

35 40 45 



Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys Lys 

60 50 55 60 

Glu Arg Gly Ser Giy Lys Lys Pro Glu Ser Ala Ala Gly Ser Gin 

€= 70 ' 75 



77 



Ser Pro Ala Leu Pro Pro Arg Leu Lys Glu Met Lys Ser Gin Glu 
80 85 90 

Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser Ser 
95 100 105 

Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn Glu 
110 115 120 

Leu Asn Arg Lys Asn Lys Pro Gin Asn He Lys He Gin Lys Lys 
125 130 135 

Pro Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala Asp 
140 145 150 

Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu Gly Asn Asp 
155 160 1€5 

Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He lie 
170 175 lao 

Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser Glu 
185 190 195 

Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr Ser 
200 205 210 

Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val Lys 
215 220 225 

Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu Cys 
230 235 240 

Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu Cys Lys 
245 250 255 

Cys Gin Pro Gly Phe Thr Gly Ala Arg Cys Thr Glu Asn Val Pro 
260 265 270 

Met Lys Val Gin Asn Gin Glu Lys Ala Glu Glu Leu Tyr Gin Lys 
275 280 285 

Arg Val Leu Thr He Thr Gly He Cys He Ala Leu Leu Val Val 
290 295 300 

Gly He Met Cys Val Val Ala Tyr Cys Lys Thr Lys Lys Gin Arg 
305 310 315 

Lys Lys Leu His Asp Arg Leu Arg Gin Ser Leu Arg Ser Glu Arg 
320 325 330 

Asn Asn Met Met Asn He Ala Asn Gly Pro His His Pro Asn Pro 
335 340 345 

Pro Pro Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys Asn 
350 355 360 

Val He Ser Ser Glu His He Val Glu Arg Glu Ala Glu Thr Ser 
365 370 375 

Phe Ser Thr Ser His Tyr Thr Ser Thr Ala His His Ser Thr Thr 
380 385 390 



Val Thr Gin Thr Pro Ser His Ser Trp Ser Asn Gly His Thr Glu 
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395 400 405 

Ser He Leu Ser Glu Sex His Set Val He Val Met Ser Sex Val 
410 415 420 

5 

Glu Asn Ser Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg 
425 430 435 

Leu Asn Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg 
10 440 445 450 

His Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His Ser 
455 460 465 

15 Glu Arg Tyr Val Ser Ala Met Thr Thr Pro Ala Arg Met Ser Pro 

470 475 480 



20 



35 



50 



Val Asp Phe His Thr Pro Ser Ser Pro Lys Ser Pro Pro Ser r \u 
485 490 5 

Met Ser Pro Pro Val Ser Ser Met Thr Val Ser Met Pro Ser Met 
500 505 510 



Ala Val Ser Pro Phe Met Glu Glu Glu Arg Pro Leu Leu Leu Val 

25 515 520 525 

Thr Pro Pro Arg Leu Arg Glu Lys Lys Phe Asp His His Pro Gin 

530 535 540 

30 Gin Phe Ser Ser Phe His His Asn Pro Ala His Asp Ser Asn Ser 

545 550 555 



Leu Pro Ala Ser Pro Leu Arg He Val Glu Asp Glu Glu Tyr Glu 
560 565 570 

Thr Thr Gin Glu Tyr Glu Pro Ala Gin Glu Pro Val Lys Lys Leu 
575 580 585 



Ala Asn Ser Ara Arg Ala Lvs Arg Thr Lys Pro Asn Gly His He 
40 590 595 600 

Ala Asn Arg Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser Ser 
605 610 615 

45 Asn Ser Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp Thr 

620 625 630 



Pro Phe Leu Gly He Gin Asn Pro Leu Ala Ala Ser Leu Glu Ala 

635 640 645 

Thr Pro Ala Phe Ara Leu Ala Asp Ser Arg Thr Asn Pro Ala Gly 

650 655 660 



Arg Phe Ser Thr Gin Glu Glu He Gin 
56 665 669 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 95 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGV: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 

Ser His Leu Val Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val 

15 10 15 

5 

Asn Gly Gly Glu Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser 

20 25 30 

Arg Tyr Leu Cys Lys Cys Gin Pro Gly Phe Thr Gly Ala Arg Cys 
10 35 40 45 

Thr Glu Asn Val Pro Met Lys Val Gin Asn Gin Glu Lys Ala Glu 
50 55 60 

15 Glu Leu Tyr Gin Lys Arg Val Leu Thr He Thr Gly He Cys He 

65 70 75 



20 



25 



30 



35 



50 



55 



60 



Ala Leu Leu Val Val Gly He Met Cys Val Val Ala Tyr Cys Lys 
80 85 90 

Thr Lys Lys Gin Arg 
95 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Asn Ser Asp Ser Glu Cys Pro Leu Ser His Asp Gly Tyr Cys Leu 
15 10 15 

His Asp Gly Val Cys Met Tyr He Glu Ala Leu Asp Lys Tyr Ala 
20 25 30 



Cys Asn Cys Val Val Gly Tyr He Gly Glu Arg Cys Gin Tyr Arg 
40 35 40 45 

Asp Leu Lys Trp Trp Glu Leu Arg His Ala Gly His Gly Gin Gin 
50 55 60 

45 Gin Lys Val He Val Val Ala Val Cys Val Val Val Leu Val Met 

65 70 75 



Leu Leu Leu Leu Ser Leu Trp Gly Ala His Tyr Tyr Arg Thr Gin 
80 85 90 

Lys 
91 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 82 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 16: 

Asn Asp Cys Pro Asp Ser His Thr Gin Phe C\'s Phe His Gly Thr 
15 10 15 
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Cys Arg Phe Leu Val Gin Glu Asp Lys Pro Ala Cys Val Cys His 
20 25 30 

5 Ser Gly Tyr Val Gly Ala Arg Cys Glu His Ala Asp Leu Leu Ala 

35 40 45 



10 



25 



40 



45 



50 



Val Val Ala Ala Ser Gin Lys Lys Gin Ala lie Thr Ala Leu Val 
50 55 60 

Val Val Ser He Val Ala Leu Ala Val Leu He He Thr Cys Val 
65 70 75 



Leu He His Cys Cys Gin Val 
15 80 82 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
2) (A) LENGTH: 87 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Lys Lys Lys Asn Pro Cys Asn Ala Glu Phe Gin Asn Phe Cys He 
15 10 15 



His Gly Glu Cys Lys Tyr He Glu His Leu Glu Ala Val Thr Cys 
30 20 25 30 

Lys Cys Gin Gin Glu Tyr Phe Gly Glu Arg Cys Gly Glu Lys Ser 
35 40 45 

35 Met Lys Thr His Ser Met He Asp Ser Ser Leu Ser Lys He Ala 

50 55 60 



Leu Ala Ala He Ala Ala Phe Met Ser Ala Val He Leu Thr Ala 
65 70 75 

Val Ala Val He Thr Val Gin Leu Arg Arg Gin Tyr 
80 85 87 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 87 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



Lys Lys Lys Asn Pro Cys Ala Ala Lys Phe Gin Asn Phe Cys He 

1 ' . 5 10 15 

55 

His Gly Glu Cys Arg Tyr He Glu Asn Leu Glu Val Val Thr Cys 

20 25 30 

His Cys His Gin Asp Tyr Phe Gly Glu Arg Cys Gly Glu Lys Thr 

60 35 40 45 

Met Lys Thr Gin Lys Lys Asp Asp Ser Asp Leu Ser Lys He Ala 

50 =r 60 
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Leu Ala Ala lie lie Val Phe Val Ser Ala Val Ser Val Ala Ala 

65 70 75 

He Gly He He Thr Ala Val Leu Leu Arg Lys Arg 
5 80 85 87 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 86 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



15 



30 



35 



40 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Lys Lys Arg Asp Pro Cys Leu Arg Lys Tyr Lys Asp Phe Cys He 
15 10 15 



His Gly Glu Cys Lvs Tyr Val Lys Glu Leu Arg Ala Pro Ser Cys 

20 20 25 30 

He Cys His Pro Gly Tyr His Gly Glu Arg Cys His Gly Leu Ser 

25 40 45 

25 Leu Pro Val Glu Asn Arg Leu Tyr Thr Tyr Asp His Thr Thr He 

50 55 60 



Leu Ala Val Val Ala Val Val Leu Ser Ser Val Cys Leu Leu Val 
65 70 75 

He Val Gly Leu Leu Met Phe Arg Tyr His Arg 
80 85 86 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

Arg Pro Asn Ala Arg Leu Pro Pro Gly Val Phe Tyr Cys 
1 5 10 13 

(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 bases 
50 (E) TYPE: nucleic acid 

(C) STRAITOEDNESS : single 

(D) TOPOLOGY: linear 



55 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:21 



CCTCGCTCCT TCTTCTTGCC CTTCC 25 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 496 bases 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

10 



15 



AA AGA GCC GGC GAG GAG TTC CCC GAA ACT TGT TGG AAC 38 
Arg Ala Gly Glu Glu Phe Pro Glu Thr Cys Trp Asn 
15 10 

TCC GGG CTC GCG CGG AGG CCA GGA GCT GAG CGG CGG CGG 77 
Ser Gly Leu Ala Arg Arg Pro Gly Ala Glu Arg Arg Arg 
15 20 25 



2D CTG CCG GAC GAT GGG AGC GTG AGC AGG ACG GTG ATA ACC 116 

Leu Pro Asp Asp Gly Ser Val Ser Arg Thr Val He Thr 
30 35 

TCT CCC CGA TCG GOT TGC GAG GGC GCC GGG CAG AGG CCA 155 
25 Ser Pro Arg Ser Gly Cys Glu Gly Ala Gly Gin Arg Pro 

40 45 50 



GGA CGC GAG CCG CCA GCG GTG GGA CCC ATC GAC GAC TTC 194 
Gly Arg Glu Pro Pro Ala Val Gly Pro He Asp Asp Phe 
30 55 60 



35 



CCG GGG CGA CAG GAG CAG CCC CGA GAG CCA GGG CGA GCG 233 
Pro Gly Arg Gin Glu Gin Pro Arg Glu Pro Gly Arg Ala 
65 70 75 

CCC GTT CCA GGT GGC CGG ACC GCC CGC CGC GTC CGC GCC 272 
Pro Val Pro Gly Gly Arg Thr Ala Arg Arg Val Arg Ala 
80 85 " 90 



40 GCG CTC CCT GCA GGC AAC GGG AGA CGC CCC CGC GCA GCG 311 

Ala Leu Pro Ala Gly Asn Gly Arg Arg Pro Arg Ala Ala 
95 100 

CGA GCG CCT CAG CGC GGC CGC TCG CTC TCC CCC TCG AGG 350 

45 Arg Ala Pro Gin Arg Gly Arg Ser Leu Ser Pro Ser Arg 

105 110 115 

GAC AAA CTT TTC CCA AAC CCG ATC CGA GCC CTT GGA CCA 389 

Asp Lys Leu Phe Pro Asn Pro He Arg Ala Leu Gly Pro 
50 120 125 



AAC TCG CCT GCG CCG AGA GCC GTC CGC GTA GAG CGC TCC 428 

Asn Ser Pro Ala Pro Arg Ala Val Arg Val Glu Arg Ser 
130 135 140 

55 

GTC TCC GGC GAG ATG TCC GAG CGC AAA GAA GGC AGA GGC 4 67 

Val Ser Gly Glu Met Ser Glu Arg Lys Glu Gly Ara Gly 
145 150 ' 155 

GO AAA GGG AAG GGC AAG AAG AAG GAG CGA GG 496 

Lys Gly Lys Gly Lys Lys Lys Glu Arg 

160 164 
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(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2490 bases 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



10 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

GTGGCTGCGG GGCAATTGAA AAAGAGCCGG CGAGGAGTTC CCCGAAACTT 50 
15 GTTGGAACTC CGGGCTCGCG CGGAGGCCAG GAGCTGAGCG GCGGCGGCTG 100 

CCGGACGATG GGAGCGTGAG CAGGACGGTG ATAACCTCTC CCCGATCGGG 150 



20 



25 



35 



50 



TTGCGAGGGC GCCGGGCAGA GGCCAGGACG CGAGCCGCCA GCGGCGGGAC 200 



CCATCGACGA CTTCCCGGGG CGACAGGAGC AGCCCCGAGA GCCAGGGCGA 250 
GCGCCCGTTC CAGGTGGCCG GACCGCCCGC CGCGTCCGCG CCGCGCTCCC 300 
30 TGCAGGCAAC GGGAGACGCC CCCGCGCAGC GCGAGCGCCT CAGCGCGGCC 350 

GCTCGCTCTC CCCATCGAGG GACAAACTTT TCCCAAACCC GATCCGAGCC 400 



CTTGGACCAA ACTCGCCTGC GCCGAGAGCC GTCCGCGTAG AGCGCTCCGT 450 



CTCCGGCGAG ATG TCC GAG CGC AAA GAA GGC AGA GGC AAA 490 
40 Met Ser Glu Arg Lys Glu Gly Arg Gly Lys 

15 10 

GGG AAG GGC AAG AAG AAG GAG CGA GGC TCC GGC AAG AAG 529 
Gly Lys Gly Lys Lys Lys Glu Arg Gly Ser Gly Lys Lys 
45 15 20 

CCG GAG TCC GCG GCG GGC AGC CAG AGC CCA GCC TTG CCT 568 
Pro Glu Ser Ala Ala Gly Ser Gin Ser Pro Ala Leu Pro 
25 30 35 



CCC CAA TTG AAA GAG ATG AAA AGC CAG GAA TCG GCT GCA 607 
Pro Gin Leu Lys Glu Met Lys Ser Gin Giu Ser Ala Ala 
40 45 



55 GGT TCC AAA CTA GTC CTT CGG TGT GAA ACC AGT TCT GAA 646 

Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser Ser Glu 
50 55 60 

TAC TCC TCT CTC AGA TTC AAG TGG TTC AAG AAT GGG AAT 685 
60 T/r Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn 

65 70 75 



84 



GAA TTG AAT CGA AAA AAC AAA CCA CAA AAT ATC AAG ATA 724 
Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn lie Lys He 
80 85 

CAA AAA AAG CCA GGG AAG TCA GAA CTT CGC ATT AAC AAA 763 
Gin Lys Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys 
90 95 100 

GCA TCA CTG GCT GAT TCT GGA GAG TAT ATG TGC AAA GTG 802 
Ala Ser Leu Ala Asp Ser Gly Glu Tyr Met Cys Lys Val 
105 110 

ATC AGC AAA TTA GGA AAT GAC AGT GCC TCT GCC AAT ATC 841 
He Ser Lys Leu Gly Asn Asp Ser Ala Ser Ala Asn He 
115 120 125 

ACC ATC GTG GAA TCA AAC GAG ATC ATC ACT GGT ATG CCA 880 
Thr He Val Glu Ser Asn Glu He He Thr Gly Met Pro 
130 135 140 

GCC TCA ACT GAA GGA GCA TAT GTG TCT TCA GAG TCT CCC 919 
Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser Glu Ser Pro 
145 150 

ATT AG A ATA TCA GTA TCC AC A GAA GGA GCA AAT ACT TCT 958 
He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr Ser 
155 160 165 

TCA TCT AC A TCT AC A TCC ACC ACT GGG AC A AGC CAT CTT 997 
Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu 
170 175 

GTA AAA TGT GCG GAG AAG GAG AAA ACT TTC TGT GTG AAT 1036 
Val Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn 
180 185 190 

GGA GGG GAG TGC TTC ATG GTG AAA GAC CTT TCA AAC CCC 1075 
Gly Gly Glu Cys Phe Met Val Lys Asp Leu Ser Asn Pro 
195 200 205 

TCG AGA TAC TTG TGC AAG TGC CCA AAT GAG TTT ACT GGT 1114 
Ser Arg T'yr Leu Cys Lys Cys Pro Asn Glu Phe Thr Gly 
210 215 

GAT CGC TGC CAA AAC TAC GTA ATG GCC AGC TTC TAC AAG 1153 
Asp Arg Cys Gin Asn Tyr Val Met Ala Ser Phe Tyr Lys 
220 225 230 

GCG GAG GAG CTG TAC CAG AAG AGA GTG CTG ACC ATA ACC 1192 
Ala Glu Glu Leu Tyr Gin Lys Arg Val Leu Thr He Thr 
235 240 

GGC ATC TGC ATC GCC CTC CTT GTG GTC GGC ATC ATG TGT 1231 
Gly He Cys He Ala Leu Leu Val Val Gly He Met Cys 
245 250 255 

GTG GTG GCC TAC TGC AAA ACC AAG AAA CAG CGG AAA AAG 127 0 
Val Val Ala T-yr Cys Lys Thr Lys Lys Gin Arg Lys Lys 
260 265 270 

CTG CAT GAC CGT CTT CGG CAG AGC CTT CGG TCT GAA CGA 13 09 
Leu His Asp Ara Leu Arg Gin Ser Leu Arg Ser Glu Ara 
275 280 
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AAC AAT ATG ATG AAC ATT GCC AAT GGG CCT CAC CAT CCT 1348 
Asn Asn Met Met Asn lie Ala Asn Gly Pro His His Pro 
285 290 295 

5 AAC CCA CCC CCC GAG AAT GTC CAG CTG GTG AAT CAA TAC 1387 

Asn Pro Pro Pro Glu Asn Val Gin Leu Val Asn Gin Tyr 
300 305 

GTA TCT AAA AAC GTC ATC TCC AGT GAG CAT ATT GTT GAG 1426 
10 Val Ser Lys Asn Val He Ser Ser Glu His He Val Glu 

310 315 320 

AGA GAA GCA GAG AC A TCC TTT TCC ACC AGT CAC TAT ACT 1465 
Arg Glu Ala Glu Thr Ser Phe Ser Thr Ser His Tyr Thr 
15 325 330 335 



20 



TCC AC A GCC CAT CAC TCC ACT ACT GTC ACC CAG ACT CCT 1504 
Ser Thr Ala His His Ser Thr Thr Val Thr Gin Thr Pro 
340 345 

AGC CAC AGC TGG AGC AAC GGA CAC ACT GAA AGC ATC CTT 1543 
Ser His Ser Trp Ser Asn Gly His Thr Glu Ser He Leu 
350 355 360 

25 TCC GAA AGC CAC TCT GTA ATC GTG ATG TCA TCC GTA GAA 1582 

Ser Glu Ser His Ser Val He Val Met Ser Ser Val Glu 
365 370 

AAC AGT AGG CAC AGC AGC CCA ACT GGG GGC CCA AGA GGA 1621 
30 Asn Ser Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly 

375 380 385 

CGT CTT AAT GGC AC A GGA GGC CCT CGT GAA TGT AAC AGC 1660 
Arg Leu Asn Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser 
35 390 395 400 

TTC CTC AGG CAT GCC AGA GAA ACC CCT GAT TCC TAC CGA 1699 
Phe Leu Arg His Ala Arg Glu Thr Pro Asp Ser Tyr Arg 
405 410 

40 

GAC TCT CCT CAT AGT GAA AGG TAT GTG TCA GCC ATG ACC 1738 
Asp Ser Pro His Ser Glu Arg Tyr Val Ser Ala Met Thr 
415 420 425 

45 ACC CCG GCT CGT ATG TCA CCT GTA GAT TTC CAC ACG CCA 1777 

Thr Pro Ala Arg Met Ser Pro Val Asp Phe His Thr Pro 
430 435 

AGC TCC CCC AAA TCG CCC CCT TCG GAA ATG TCT CCA CCC 1816 
50 Ser Ser Pre Lys Ser Pro Pro Ser Glu Met Ser Pro Pro 

440 445 450 

GTG TCC AGC ATG ACG GTG TCC AAG CCT TCC ATG GCG GTC 1855 
Val Ser Ser Met Thr Val Ser Lys Pro Ser Met Ala Val 
55 45E 460 465 

AGC CCC TTC ATG GAA GAA GAG AGA CCT CTA CTT CTC GTG 1894 
Ser Pro Phe Met Glu Glu Glu Arg Pro Leu Leu Leu Val 
470 475 



60 



ACA CCA CCA AGG CTG CGG GAG AAG AAG TTT GAC CAT CAC 1933 
Thr Pro Pre Arg Leu Arg Glu Lys Lys Phe Asp His His 
480 48E 490 



wo 92/20798 % % PCr/US92/04295 



86 

CCT CAG CAG TTC AGC TCC TTC CAC CAC AAC CCC GCG CAT 1972 
Pro Gin Gin Phe Ser Ser Phe His His Asn Pro Ala His 
495 500 

5 GAC AGT AAC AGC CTC CCT GCT AGC CCC TTG AGG ATA GTG 2011 

Asp Ser Asn Ser Leu Pro Ala Ser Pro Leu Arg lie Val 
505 510 515 

GAG GAT GAG GAG TAT GAA ACG ACC CAA GAG TAC GAG CCA 2050 
10 Glu Asp Glu Glu Tyr Glu Thr Thr Gin Glu Tyr Clu Pro 

520 525 530 

GCC CAA GAG CCT GTT AAG AAA CTC CCC AAT AGC CGG CGG 2089 
Ala Gin Glu Pro Val Lys Lys Leu Ala Asn Ser Arg Arg 
15 535 540 

GCC AAA AG A ACC AAG CCC AAT GGC CAC ATT GCT AAC AGA 2128 
Ala Lys Arg Thr Lys Pro Asn Gly His He Ala Asn Arg 
545 550 555 

20 

TTG GAA GTG GAC AGC AAC ACA AGC TCC CAG AGC AGT AAC 2167 
Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser Ser Asn 
560 565 

25 TCA GAG AGT GAA ACA GAA GAT GAA AGA GTA GGT GAA GAT 2206 

Ser Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp 
570 575 580 

ACG CCT TTC CTG GGC ATA CAG AAC CCC CTG GCA GCC AGT 2245 
30 Thr Pro Phe Leu Gly He Gin Asn Pro Leu Ala Ala Ser 

585 590 595 

CTT GAG GCA ACA CCT GCC TTC CGC CTG GCT GAC AGC AGG 2284 
Leu Glu Ala Thr Pro Ala Phe Arg Leu Ala Asp Ser Arg 
35 600 605 

ACT AAC CCA GCA GGC CGC TTC TCG ACA CAG GAA GAA ATC 2323 
Thr Asn Pro Ala Gly Arg Phe Ser Thr Gin Glu Glu He 
610 615 620 

40 

CAG GCC AGG CTG TCT AGT GTA ATT GCT AAC CAA GAC CCT 2362 
Gin Ala Arg Leu Ser Ser Val He Ala Asn Gin Asp Pro 
625 630 

45 ATT GCT GTA TAAAACCTA AATAAACACA TAGATTCACC TGTAAAACTT 2410 

He Ala Val 
635 637 



50 



TATTTTATAT AATAAAGTAT TCCACCTTAA ATTAAACAAT TTATTTTATT 2460 
TTAGCAGTTC TGCAAATAAA AAAAAAAAAA 2490 



56 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1715 bases 
60 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



B7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

GCGCCTGCCT CCAACCTGCG GGCGGGAGGT GGGTGGCTGC GGGGCAATTG 50 
AAAAAGAGCC GGCGAGGAGT TCCCCGAAAC TTGTTGGAAC TCCGGGCTCG 100 
CGCGGAGGCC AGGAGCTGAG CGGCGGCGGC TGCCGGACGA TGGGAGCGTG 150 
AGCAGGACGG TGATAACCTC TCCCCGATCG GGTTGCGAGG GCGCCGGGCA 200 
GAGGCCAGGA CGCGAGCCGC CAGCGGCGGG ACCCATCGAC GACTTCCCGG 250 
GGCGACAGGA GCAGCCCCGA GAGCCAGGGC GAGCGCCCGT TCCAGGTGGC 300 
CGGACCGCCC GCCGCGTCCG CGCCGCGCTC CCTGCAGGCA ACGGGAGACG 350 
CCCCCGCGCA GCGCGAGCGC CTCAGCGCGG CCGCTCGCTC TCCCCATCGA 400 
GGGACAAACT TTTCCCAAAC CCGATCCGAG CCCTTGGACC AAACTCGCCT 450 



GCGCCGAGAG CCGTCCGCGT AGAGCGCTCC GTCTCCGGCG AG ATG 495 

Met 
1 



TCC 
Ser 


GAG 
Glu 


CGC 
Arg 


AAA 
Lys 
5 


GAA 
Glu 


GGC 
Gly 


AGA 
Arg 


GGC 
Gly 


AAA 
Lys 
10 


GGG 
Gly 


AAG 
Lys 


GGC 
Gly 


AAG 
Lys 


534 


AAG 
Lys 
15 


AAG 
Lys 


GAG 
Glu 


CGA 
Arg 


GGC 
Gly 


TCC 
Ser 
20 


GGC 
Gly 


AAG 
Lys 


AAG 
Lys 


CCG 
Pro 


GAG 
Glu 
25 


TCC 
Ser 


GCG 
Ala 


573 


GCG 
Ala 


GGC 
Gly 


AGC 
Ser 
30 


CAG 
Gin 


AGC 
Ser 


CCA 
Pro 


GCC 
Ala 


TTG 
Leu 
35 


CCT 
Pro 


CCC 
Pro 


CAA 
Gin 


TTG 
Leu 


AAA 
Lys 
40 


612 


GAG 
Glu 


ATG 
Met 


AAA 
Lys 


AGC 
Ser 


CAG 
Gin 
45 


GAA 
Glu 


TCG 
Ser 


GCT 
Ala 


GCA 
Ala 


GGT 
Gly 
50 


TCC 
Ser 


AAA 
Lys 


CTA 
Leu 


651 


GTC 
Val 


CTT 
Leu 
55 


CGG 
Arg 


TGT 
Cys 


GAA 
Glu 


ACC 
Thr 


AGT 
Ser 
60 


TCT 
Ser 


GAA 
Glu 


TAC 
T^'r 


TCC 
Ser 


TCT 
Ser 
65 


CTC 
Leu 


690 


AGA 
Arg 


TTC 
Phe 


AAG 

Lys 


TGG 
Trp 
70 


TTC 
Phe 


AAG 

Lys 


AAT 
Asn 


GGG 
Gly 


AAT 

Asn 
75 


GAA 
Glu 


TTG 
Leu 


AAT 
Asn 


CGA 
Arg 


729 



AAA AAC AAA CCA CAA AAT ATC AAG ATA CAA AAA AAG CCA 768 
Lys Asn Lvs Pro Gin Asn lie Lys He Gin Lys Lys Pro 
80 ' 85 90 
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GGG AAG TCA GAA CTT CGC ATT AAC AAA GCA TCA CTG GCT 807 
Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 
95 100 105 

5 GAT TCT GGA GAG TAT ATG TGC AAA GTG ATC AGC AAA TTA 846 

Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu 
110 115 

GGA AAT GAC AGT GCC TCT GCC AAT ATC ACC ATC GTG GAA 885 
10 Gly Asn Asp Ser Ala Ser Ala Asn He Thr He Val Glu 

120 125 130 

TCA AAC GAG ATC ATC ACT GGT ATG CCA GCC TCA ACT GAA 924 
Ser Asn Glu He He Thr Gly Met Pro Ala Ser Thr Glu 
15 135 140 

GGA GCA TAT GTG TCT TCA GAG TCT CCC ATT AGA ATA TCA 963 
Gly Ala Tyr Val Ser Ser Glu Ser Pro He Arg He Ser 
145 150 155 

20 

GTA TCC AC A GAA GGA GCA AAT ACT TCT TCA TCT AC A TCT 1002 
Val Ser Thr Glu Gly Ala Asn Thr Ser Ser Ser Thr Ser 
160 165 170 

25 AC A TCC ACC ACT GGG AC A AGC CAT CTT GTA AAA TGT GCG 1041 

Thr Ser Thr Thr Gly Thr Ser His Leu Val Lys Cys Ala 
175 180 

GAG AAG GAG AAA ACT TTC TGT GTG AAT GGA GGG GAG TGC 1080 
30 Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu Cys 

185 190 195 

TTC ATG GTG AAA GAC CTT TCA AAC CCC TCG AGA TAC TTG 1119 
Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu 
35 200 205 

TGC AAG TGC CCA AAT GAG TTT ACT GGT GAT CGC TGC CAA 1158 
Cys Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin 
210 215. 220 



40 



50 



60 



AAC TAC GTA ATG GCC AGC TTC TAC AGT ACG TCC ACT CCC 1197 
Asn Tyr Val Met Ala Ser Phe Tyr Ser Thr Ser Thr Pro 
225 230 235 



45 TTT CTG TCT CTG CCT GAA TAGGA GCATGCTCAG TTGGTGCTGC 1240 

Phe Leu Ser Leu Pro Glu 
240 241 



TTTCTTGTTG CTGCATCTCC CCTCAGATTC CACCTAGAGC TAGATGTGTC 129 C 
TTACCAGATC TAATATTGAC TGCCTCTGCC TGTCGCATGA GAACATTAAC 13 4 C 
55 AAAAGCAATT GTATTACTTC CTCTGTTCGC GACTAGTTGG CTCTGAGATA 139: 

CTAATAGGTG TGTGAGGCTC CGGATGTTTC TGGAATTGAT ATTGAATGAT 144: 
GTGATACAAA TTGATAGTCA ATATCAAGCA GTGAAATATG ATAATAAAGG 149: 



CATTTCAAAG TCTCACTTTT ATTGATAAAA TAAAAATCAT TCTACTGAAC 154: 



wo 92/20798 W W PCr/US92/04295 

89 

AGTCCATCTT CTTTATACAA TGACCACATC CTGAAAAGGG TGTTGCTAAG 1590 

5 

CTGTAACCGA TATGCACTTG AAATGATGGT AAGTTAATTT TGATTCAGAA 1640 
TGTGTTATTT GTCACAAATA AACATAATAA AAGGAGTTCA GATGTTTTTC 1690 
TTCATTAACC AAAAAAAAAA AAAAA 1715 



10 



15 



25 



35 



40 



50 



(2) INFORMATION FOR SEQ ID NO: 25: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2431 bases 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : N.A. 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GAGGCGCCTG CCTCCAACCT GCGGGCGGGA GGTGGGTGGC TGCGGGGCAA 50 
30 TTGAAAAAGA GCCGGCGAGG AGTTCCCCGA AACTTGTTGG AACTCCGGGC 100 

TCGCGCGGAG GCCAGGAGCT GAGCGGCGGC GGCTGCCGGA CGATGGGAGC 150 
GTGAGCAGGA CGGTGATAAC CTCTCCCCGA TCGGGTTGCG AGGGCGCCGG 200 
GCAGAGGCCA GGACGCGAGC CGCCAGCGGC GGGACCCATC GACGACTTCC 250 
CGGGGCGACA GGAGCAGCCC CGAGAGCCAG GGCGAGCGCC CGTTCCAGGT 300 
45 GGCCGGACCG CCCGCCGCGT CCGCGCCGCG CTCCCTGCAG GCAACGGGAG 350 

ACGCCCCCGC GCAGCGCGAG CGCCTCAGCG CGGCCGCTCG CTCTCCCCAT 400 
CGAGGGACAA ACTTTTCCCA AACCCGATCC GAGCCCTTGG ACCAAACTCG 450 



CCTGCGCCGA G^GCCGTCCG CGTAGAGCGC TCCGTCTCCG GCGAG AT 49: 
55 Met 

1 

G TCC GAG CGC AAA GAA GGC AGA GGC AAA GGG AAG GGC AAG 537 
Ser Glu Ars Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys 
60 5 10 

AAG AAG GAG CGA GGC TCC GGC AAG AAG CCG GAG TCC GCG 57 6 
Lys Lys Glu Ara Glv Ser Gly Lys Lys Pro Glu Ser Ala 
15 ■ ' 20 25 
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GCG GGC AGC CAG AGC CCA GCC TTG CCT CCC CAA TTG AAA 615 
Ala Gly Sex Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys 
30 35 40 

5 

GAG ATG AAA AGC CAG GAA TCG GCT GCA GGT TCC AAA CTA 654 
Glu Met Lys Ser Gin Glu Ser Ala Ala Gly Ser Lys Leu 
45 50 

10 GTC CTT CGG TGT GAA ACC AGT TCT GAA TAC TCC TCT CTC 693 

Val Leu Arg Cys Glu Thr Ser Ser Glu Tyr Ser Ser Leu 
55 60 65 

AGA TTC AAG TGG TTC AAG AAT GGG AAT GAA TTG AAT CGA 732 
15 Arg Phe Lys Trp Phe Lys Asn Gly Asn Glu Leu Asn Arg 

70 75 

AAA AAC AAA CCA CAA AAT ATC AAG ATA CAA AAA AAG CCA 771 
Lys Asn Lys Pro Gin Asn lie Lys He Gin Lys Lys Pro 
20 80 85 90 

GGG AAG TCA GAA CTT CGC ATT AAC AAA GCA TCA CTG GCT 810 
Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 
95 100 105 



25 



45 



GAT TCT GGA GAG TAT ATG TGC AAA GTG ATC AGC AAA TTA 849 
Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu 
110 115 



30 GGA AAT GAC AGT GCC TCT GCC AAT ATC ACC ATC GTG GAA 888 

Gly Asn Asp Ser Ala Ser Ala Asn He Thr He Val Glu 
120 125 130 

TCA AAC GAG ATC ATC ACT GGT ATG CCA GCC TCA ACT GAA 927 
35 Ser Asn Glu He He Thr Gly Met Pro Ala Ser Thr Glu 

135 140 

GGA GCA TAT GTG TCT TCA GAG TCT CCC ATT AGA ATA TCA 966 
Gly Ala Tyr Val Ser Ser Glu Ser Pro He Arg He Ser 
40 145 150 155 

GTA TCC AC A GAA GGA GCA AAT ACT TCT TCA TCT ACA TCT 1005 
Val Ser Thr Glu Gly Ala Asn Thr Ser Ser Ser Thr Ser 
160 165 170 



ACA TCC ACC ACT GGG ACA AGC CAT CTT GTA AAA TGT GCG 1044 
Thr Ser Thr Thr Gly Thr Ser His Leu Val Lys Cys Ala 
175 180 



50 GAG AAG GAG AAA ACT TTC TGT GTG AAT GGA GGG GAG TGC 1083 

Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu Cys 
185 190 195 

TTC ATG GTG AAA GAC CTT TCA AAC CCC TCG AGA TAC TTG 1122 
55 Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu 

200 205 

TGC AAG TGC CCA AAT GAG TTT ACT GGT GAT CGC TGC CAA 1161 
Cys Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin 
60 210 215 220 

AAC TAC GTA ATG GCC AGC TTC TAC AAG GCG GAG GAG CTG 1200 
Asn Tvr Val Met Ala Ser Phe Tyr Lys Ala Glu Glu Leu 
225 230 235 
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TAC CAG AAG AGA GTG CTG ACC ATA ACC GGC ATC TGC ATC 1239 

Tyr Gin Lys Arg Val Leu Thr lie Thr Gly He Cys He 
240 245 

5 

GCC CTC CTT GTG GTC GGC ATC ATG TGT GTG GTG GCC TAC 1278 

Ala Leu Leu Val Val Gly He Met Cys Val Val Ala Tyr 

250 255 260 

10 TGC AAA ACC AAG AAA CAG CGG AAA AAG CTG CAT GAC CGT 1317 

Cys Lys Thr Lys Lys Gin Arg Lys Lys Leu His Asp Arg 
265 270 



CTT CGG CAG AGC CTT CGG TCT GAA CGA AAC AAT ATG ATG 1356 
15 Leu Arg Gin Ser Leu Arg Ser Glu Arg Asn Asn Met Met 

275 280 285 

AAC ATT GCC AAT GGG CCT CAC CAT CCT AAC CCA CCC CCC 1395 
Asn He Ala Asn Gly Pro His His Pro Asn Pro Pro Pro 
2D 290 295 300 



GAG AAT GTC CAG CTG GTG AAT CAA TAC GTA TCT AAA AAC 1434 
Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys Asn 
305 310 

25 

GTC ATC TCC AGT GAG CAT ATT GTT GAG AGA GAA GCA GAG 1473 
Val He Ser Ser Glu His He Val Glu Arg Glu Ala Glu 
315 320 325 

30 AC A TCC TTT TCC ACC AGT CAC TAT ACT TCC AC A GCC CAT 1512 

Thr Ser Phe Ser Thr Ser His Tyr Thr Ser Thr Ala His 
330 335 



CAC TCC ACT ACT GTC ACC CAG ACT CCT AGC CAC AGC TGG 1551 
35 His Ser Thr Thr Val Thr Gin Thr Pro Ser His Ser Trp 

340 345 350 



40 



AGC AAC GGA CAC ACT GAA AGC ATC CTT TCC GAA AGC CAC 1590 
Ser Asn Gly His Thr Glu Ser He Leu Ser Glu Ser His 
355 360 365 



45 



50 



TCT GTA ATC GTG ATG TCA TCC GTA GAA AAC AGT AGG CAC 1629 
Ser Val He Val Met Ser Ser Val Glu Asn Ser Arg His 
370 375 

AGC AGC CCA ACT GGG GGC CCA AGA GGA CGT CTT AAT GGC 1668 
Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn Gly 
380 385 390 

AC A GGA GGC CCT CGT GAA TGT AAC AGC TTC CTC AGG CAT 1707 
Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg His 
395 400 



GCC AGA GAA ACC CCT GAT TCC TAC CGA GAC TCT CCT CAT 174 6 
55 Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His 

405 410 415 

AGT GAA AGG TAAAA CCGAAGGCAA AGCTACTGCA GAGGAGAAAC 179 C 
Ser Glu Ara 
60 420 



TCAGTCAGAG AATCCCTGTG AGCACCTGCG GTCTCACCTC AGGAAATCTA 184C 
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CTCTAATCAG AATAAGGGGC GGCAGTTACC TGTTCTAGGA GTGCTCCTAG 1890 



TTGATGAAGT CATCTCTTTG TTTGACGGAA CTTATTTCTT CTGAGCTTCT 1940 

5 

CTCGTCGTCC CAGTGACTGA CAGGCAACAG ACTCTTAAAG AGCTGGGATG 1990 



10 CTTTGATGCG GAAGGTGCAG CACATGGAGT TTCCAGCTCT GGCCATGGGC 2040 



TCAGACCCAC TCGGGGTCTC AGTGTCCTCA GTTGTAACAT TAGAGAGATG 2090 

15 

GCATCAATGC TTGATAAGGA CCCTTCTATA ATTCCAATTG CCAGTTATCC 2140 



AAACTCTGAT TCGGTGGTCG AGCTGGCCTC GTGTTCTTAT CTGCTAACCC 2190 

ao 

TGTCTTACCT TCCAGCCTCA GTTAAGTCAA ATCAAGGGCT ATGTCATTGC 224 0 



25 TGAATGTCAT GGGGGGCAAC TGCTTGCCCT CCACCCTATA GTATCTATTT 22 9 C 



TATGAAATTC CAAGAAGGGA TGAATAAATA AATCTCTTGG ATGCTGCGTC 2340 

30 

TGGCAGTCTT CACGGGTGGT TTTCAAAGCA GAAAAAAAAA AAAAAAAAAA 2390 



AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA A 2431 

35 



(2) INFORMATION FOR SEQ ID NO: 26: 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 625 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Met Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys 
15 10 15 

50 Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser 

20 25 30 

Gin Ser Pro Ala Leu Pro Pro Arg Leu Lys Glu Met Lys Ser Gin 
35 40 45 

55 

Glu Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser 
50 55 60 

Ser Glu Tyr Ser Ser Leu Aro Phe Lys Trp Phe Lys Asn Gly Asn 
60 65 70 75 

Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn lie Lys lie Gin Lys 
80 ' 85 90 
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Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 
95 100 105 

Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu Gly Asn 
5 110 115 120 

Asp Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He 
125 130 135 

10 He Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser 

140 145 150 



15 



30 



45 



eo 



Glu Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr 
155 160 165 

Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val 
170 175 180 



Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu 
2D 185 190 195 

Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu Cys 
200 205 210 

25 Lys Cys Gin Pro Gly Phe Thr Gly Ala Arg Cys Thr Glu Asn Val 

215 220 225 



Pro Met Lys Val Gin Asn Gin Glu Lys Ala Glu Glu Leu Tyr Gin 
230 235 240 

Lys Arg Val Leu Thr He Thr Gly He Cys He Ala Leu Leu Val 
245 250 255 



Val Gly He Met Cys Val Val Ala Tyr Cys Lys Thr Lys Lys Gin 
35 260 265 270 

Arg Lys Lys Leu His Asp Arg Leu Arg Gin Ser Leu Arg Ser Glu 
275 280 285 

40 Arg Asn Asn Met Met Asn He Ala Asn Gly Pro His His Pro Asn 

290 295 300 



Pro Pro Pro Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys 

305 310 315 

Asn Val He Ser Ser Glu His He Val Glu Arg Glu Ala Glu Thr 

320 325 330 



Ser Phe Ser Thr Ser His Tyr Thr Ser Thr Ala His His Ser Thr 

50 335 340 345 

Thr Val Thr Gin Thr Pro Ser His Ser Trp Ser Asn Gly His Thr 

350 355 360 

55 Glu Ser He Leu Ser Glu Ser His Ser Val He Val Met Ser Ser 

365 370 375 



Val Glu Asn Ser Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly 

380 385 390 

Arg Leu Asn Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu 

395 400 405 
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5 
10 
15 

ao 

25 

30 

35 

40 

45 

50 

55 

60 



Arg His Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His 
410 415 420 

Ser Glu Arg Tyr Val Ser Ala Met Thr Thr Pro Ala Arg Met Ser 
425 430 435 

Pro Val Asp Phe His Thr Pro Ser Ser Pro Lys Ser Pro Pro Ser 
440 445 450 

Glu Met Ser Pro Pro Val Ser Ser Met Thr Val Ser Met Pro Ser 
455 460 465 

Met Ala Val Ser Pro Phe Met Glu Glu Glu Arg Pro Leu Leu Leu 
470 475 480 

Val Thr Pro Pro Arg Leu Arg Glu Lys Lys Phe Asp His His Pro 
485 490 495 

Gin Gin Phe Ser Ser Phe His His Asn Pro Ala His Asp Ser Asn 
500 505 510 

Ser Leu Pro Ala Ser Pro Leu Arg He Val Glu Asp Glu Glu Tyr 
515 520 525 

Glu Thr Thr Gin Glu Tyr Glu Pro Ala Gin Glu Pro Val Lys Lys 
530 535 540 

Leu Ala Asn Ser Arg Arg Ala Lys Arg Thr Lys Pro Asn Gly His 
545 550 555 

He Ala Asn Arg Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser 
560 565 570 

Ser Asn Ser Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp 
575 580 585 

Thr Pro Phe Leu Gly He Gin Asn Pro Leu Ala Ala Ser Leu Glu 
590 595 600 

Ala Thr Pro Ala Phe Arg Leu Ala Asp Ser Ara Thr Asn Pro Ala 
605 610 ' 615 

Gly Arg Phe Ser Thr Gin Glu Glu He Gin 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 645 amino acids 

(B) TYPE: 2unino acid 
(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

Met Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys 
15 10 15 

Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser 
20 25 30 

Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin 



35 



40 



45 
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5 
10 
15 

2D 
25 
30 
35 
40 
45 
50 
55 
60 



Glu Ser Ala Ala Gly Sex Lys Leu Val Leu Arg Cys Glu Thr Ser 
50 55 €0 

Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn 
65 70 75 

Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn lie Lys He Gin Lys 
80 85 90 

Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 
95 100 105 

Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys I^eu Gly Asn 
110 115 120 

Asp Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He 
125 130 1?5 

He Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser S 

140 145 IbO 

Glu Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr 
155 160 165 

Ser S^r Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val 
170 175 180 

Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu 
185 190 195 

Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg T^^r Leu Cys 
200 205 210 

Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin Asn Tyr Val 
215 220 225 

Met Ala Ser Phe Tyr Lys His Leu Gly He Glu Phe Met Glu Ala 
230 235 240 

Glu Glu Leu Tyr Gin Lys Arg Val Leu Thr He Thr Gly He Cys 
245 250 255 

He Ala Leu Leu Val Val Gly He Met Cys Val Val Ala Tyr Cys 
260 265 270 

Lys Thr Lys Lys Gin Arg Lys Lys Leu His Asp Arg Leu Arg Gin 
275 280 285 

Ser Leu Arg Ser Glu Arg Asn Asn Met Met Asn He Ala Asn Gly 
290 295 300 

Pro His His Pro Asn Pro Pro Pro Glu Asn Val Gin Leu Val Asn 
305 310 315 

Gin Tvr Val Ser Lvs Asn Val He Ser Ser Glu His He Val Glu 
320 325 330 

Arg Glu Ala Glu Thr Ser Phe Ser Thr Ser His Tyr Thr Ser Thr 
335 340 345 

Ala His His Ser Thr Thr Val Thr Gin Thr Pro Ser His Ser Trp 



350 



355 



360 
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Ser Asn Gly His Thr Glu Ser He Leu Ser Glu Ser His Ser Val 
365 370 375 

He Val Met Ser Ser Val Glu Asn Ser Arg His Ser Ser Pro Thr 
5 380 385 390 

Gly Gly Pro Arg Gly Arg Leu Asn Gly Thr Gly Gly Pro Arg Glu 
395 400 405 

10 Cys Asn Ser Phe Leu Arg His Ala Arg Glu Thr Pro Asp Ser Tyr 

410 415 420 



15 



30 



45 



eo 



Arg Asp Ser Pro His Ser Glu Arg Tyr Val Ser Ala Met Thr Thr 
425 430 435 

Pro Ala Arg Met Ser Pro Val Asp Phe His Thr Pro Ser Ser Pro 
440 445 4S0 



Lys Ser Pro Pro Ser Glu Met Ser Pro Pro Val Ser Ser Met 'r'.r 
2D 455 460 465 

Val Ser Met Pro Ser Met Ala Val Ser Pro Phe Met Glu Glu Glu 
470 475 480 

25 Arg Pro Leu Leu Leu Val Thr Pro Pro Arg Leu Arg Glu Lys Lys 

485 490 495 



Phe Asp His His Pro Gin Gin Phe Ser Ser Phe His His Asn Pro 
500 505 510 

Ala His Asp Ser Asn Ser Leu Pro Ala Ser Pro Leu Arg He Val 
515 520 525 



Glu Asp Glu Glu Tyr Glu Thr Thr Gin Glu Tyr Glu Pro Ala Gin 
35 530 535 540 

Glu Pro Val Lys Lys Leu Ala Asn Ser Arg Arg Ala Lys Arg Thr 
545 550 555 

40 Lys Pro Asn Gly His He Ala Asn Arg Leu Glu Val Asp Ser Asn 

560 565 570 



Thr Ser Ser Gin Ser Ser Asn Ser Glu Ser Glu Thr Glu Asp Glu 
575 580 585 

Arg Val Gly Glu Asp Thr Pro Phe Leu Gly He Gin Asn Pro Leu 
590 595 600 



Ala Ala Ser Leu Glu Ala Thr Pro Ala Phe Arg Leu Ala Asp Ser 
50 605 610 615 

Arg Thr Asn Pro Ala Gly Arg Phe Ser Thr Gin Glu Glu He Gin 
620 625 630 

55 Ala Arg Leu Ser Ser Val He Ala Asn Gin Asp Pro He Ala Val 

635 640 645 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 627 amino acids 
{B; TYPE: amino acid 
fD? TOPOLOGV: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

Met Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys 

5 1 5 10 15 

Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser 

20 25 30 

10 Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin 

35 40 45 



15 



30 



45 



eo 



Glu Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser 

50 55 60 

Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn 

65 70 75 



Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn lie Lys lie Gin Lys 
20 80 85 90 

Lys Pro Gly Lys Ser Glu Leu Arg lie Asn Lys Ala Ser Leu Ala 
95 100 105 

25 Asp Ser Gly Glu Tyr Met Cys Lys Val lie Ser Lys Leu Gly Asn 

110 115 120 



Asp Ser Ala Ser Ala Asn lie Thr He Val Glu Ser Asn Glu He 

125 130 135 

He Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser 

140 145 150 



Glu Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr 
35 155 160 165 

Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val 
170 175 180 

40 Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu 

185 190 195 



Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu Cys 
200 205 210 

Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin Asn Tyr Val 
215 220 225 



Met Ala Ser Phe Tyr Lys Ala Glu Glu Leu Tyr Gin Lys Arg Val 

50 230 235 240 

Leu Thr He Thr Gly He Cys He Ala Leu Leu Val Val Gly He 

245 250 255 

55 Met Cys Val Val Ala Tyr Cys Lys Thr Lys Lys Gin Arg Lys Lys 

260 265 " ' 270 



Leu His Asp Arg Leu Arg Gin Ser Leu Arg Ser Glu Arg Asn Asn 

275 280 285 

Met Met Asn He Ala Asn Gly Pro His His Pro Asn Pro Pro Pro 

290 295 300 
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Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys Asn Val He 

305 310 315 

Ser Ser Glu His He Val Glu Arg Glu Ala Glu Thr Ser Phe Ser 

320 325 330 



10 



Thr Ser His Tyr Thr Ser Thr Ala His His Ser Thr Thr Val Thr 

335 340 345 

Gin Thr Pro Ser His Ser Trp Ser Asn Gly His Thr Glu Ser He 

350 355 360 



15 



Leu Ser Glu Ser His Ser Val He Val Met Ser Ser Val Glu Asn 
365 370 375 

Ser Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn 
380 385 390 



Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg His Ala 
20 395 400 405 



25 



Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His Ser Glu Arg 
410 415 420 

Tyr Val Ser Ala Met Thr Thr Pro Ala Arg Met Ser Pro Val Asp 
425 430 435 



30 



Phe His Thr Pro Ser Ser Pro Lys Ser Pro Pro Ser Glu Met Ser 

440 445 450 

Pro Pro Val Ser Ser Met Thr Val Ser Lys Pro Ser Met Ala Val 

455 460 465 



Ser Pro Phe Met Glu Glu Glu Arg Pro Leu Leu Leu Val Thr Pro 
35 470 475 480 



40 



Pro Arg Leu Arg Glu Lys Lys Phe Asp His His Pro Gin Gin Phe 

465 490 495 

Ser Ser Phe His His Asn Pro Ala His Asp Ser Asn Ser Leu Pro 

500 505 510 



45 



Ala Ser Pro Leu Arg He Val Glu Asp Glu Glu lyr Glu Thr Thr 
515 520 525 

Gin Glu Tyr Glu Pro Ala Gin Glu Pro Val Lys Lys Leu Ala Asn 
530 535 540 



Ser Arg Arg Ala Lys Arg Thr Lys Pro Asn Gly His He Ala Asn 
50 545 550 555 

Arg Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser Ser Asn Ser 
560 565 570 

55 Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp Thr Pro Phe 

575 580 585 



eo 



Leu Gly He Gin Asn Pro Leu Ala Ala Ser Leu Glu Ala Thr Pro 

590 595 600 

Ala Phe Arg Leu Ala Asp Ser Arg Thr Asn Pro Ala Gly Arg Phe 

605 610 615 
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Ser Thr Gin Glu Glu He Gin Ala Arg Leu Ser Ser Val He Ala 
€20 625 630 

Asn Gin Asp Pro He Ala Val 
5 635 637 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 420 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



15 



30 



45 



60 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

Met Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys 
15 10 15 



Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser 
2D 20 25 30 

Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin 
35 40 45 

25 Glu Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser 

50 55 60 



Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn 

65 70 75 

Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn He Lys He Gin Lys 

80 85 90 



Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 
35 95 100 105 

Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu Gly Asn 
110 115 120 

40 Asp Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He 

125 130 135 



He Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser 
140 145 150 

Glu Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr 
155 160 165 



Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val 

50 170 175 180 

Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu 

185 190 195 

55 Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu Cys 

200 205 210 



Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin Asn Tyr Val 

215 220 225 

Met Ala Ser Phe Tyr Lys Ala Glu Glu Leu Tyr Gin Lys Arg Val 

230 235 240 
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5 
10 
15 

20 
25 
30 
35 
40 
45 
50 
55 

eo 



Leu Thr He Thr Gly He Cys He Ala Leu Leu Val Val Gly He 
245 250 255 

Met Cys Val Val Ala Tyr Cys Lys Thr Lys Lys Gin Arg Lys Lys 
260 265 270 

Leu His Asp Arg Leu Arg Gin Ser Leu Arg Ser Glu Arg Asn Asn 
275 280 285 

Met Met Asn He Ala Asn Gly Pro His His Pro Asn Pro Pro Pro 
290 295 300 

Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys Asn Val He 
305 310 315 

Ser Ser Glu His He Val Glu Arg Glu Ala Glu Thr Ser Phe Ser 
320 325 330 

Thr Ser His Tyr Thr Ser Thr Ala His His Ser Thr Thr Val Thr 
335 340 345 

Gin Thr Pro Ser His Ser Trp Ser Asn Gly His Thr Glu Ser He 
350 355 360 

Leu Ser Glu Ser His Ser Val He Val Met Ser Ser Val Glu Asn 
365 370 375 

Ser Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn 
380 385 390 

Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg His Ala 
395 400 405 

Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His Ser Glu Arg 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 

Met Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys Lys 
15 10 15 

Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser 
20 25 30 

Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin 
35 40 45 

Glu Ser Ala Ala Giv Ser Lys Leu Val Leu Arg Cvs Glu Thr Ser 
EO 55 ' 60 

Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn Gly Asn 
65 70 75 

Glu Leu Asn Arg Lys Asn Lys Pre Gin Asn He Lys He Gin Lys 



fir 



c - 
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101 

Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 

95 100 105 

Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu Gly Asn 

110 115 120 



10 



Asp Ser Ala Ser Ala Asn He Thr He Val Glu Ser Asn Glu He 
125 130 135 

He Thr Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser 
140 145 150 



15 



Glu Ser Pro He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr 
155 160 165 

Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu Val 
170 175 180 



Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu 
2D 18S 190 195 

Cys Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu Cys 
200 205 210 

25 Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin Asn Tyr Val 

215 220 225 



30 



Met Ala Ser Phe Tyr Ser Thr Ser Thr Pro Phe Leu Ser Leu Pro 
230 235 240 

Glu 
241 
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WE CLAIM: 

1 A composition comprising isolated heregulin polypeptide. 
5 2 The composition of claim 1 wherein the heregulin is antigenically active. 

3. The composition of claim 1 wherein the heregulin is biologically active. 

4. The composition of claim 3 wherein the heregulin is HRG-GFD. 

10 

& The composition of claim 1 wherein the heregulin is heregulin 
•a, -pl,-p2, or-p3. 

6 The composition of claim 3 wherein the heregulin is human heregulin-a-GFD. 

15 

7. The composition of claim 3 wherein the heregulin is human hereguiin-pl-GFD, 
heregulin-|52-GFD or heregulin-p3-GFD . 

a The composition of claim 1 further comprising pharmaceutically acceptable carrier. 

20 

9. The composition of claim 8 wherein the heregulin is a heregulin GFD. 

10. The composition of claim 9 further comprising an immune adjuvant. 

25 11. The composition of claim 10 wherein the heregulin GFD comprises an immunogenic, 
non*heregulin polypeptide. 

12. The composition of claim 1 wherein the heregulin is fTO-GFD. 

30 13. The composition of claim 1 wherein the heregulin is NTD-GFD-transmembrane 
polypeptide. 

14. The connposition of claim 1 wherein the heregulin is HRG-GFD. 

35 15. The composition of claim 1 wherein the heregulin comprises a cytoplasmic domain. 

16. The composition of claim 1 wherein the heregulin is NTD-GFD and it has an amino 
acid sequence which is at least 85% homologous with the native heregulin-a, -pi . 
-p2, -P3 NTD-GFD sequence. 
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103 

17. The composition of claim 1 wherein the heregulin polypeptide comprises an enzyme . 

18. The composition of daim 16 wherein the heregulin is HRG-a. 

5 

19. The composition of claim 18 whereri the heregu!in-a has an amino acid substituted, 
deleted or inserted adjacent to any one of residues 1-23, 107-108,12M23, 128-130 and 
163-247 (Rg. 15). 

10 20. The composition of claim 16 wherein the heregulin is HRG-Pi . 

21 . The conposition of claim 20 wherein the heregulin ^^ has an amino acid 

substituted, deleted or inserted adjacent to residues 1-23, 107-108, 121-123, 128-130 
and 163-252 (Fig. 15). 



15 



22. The composition of claim 16 wherein the heregulin is HRG-|32. 



23. The composition of claim 22 wherein the heregulin ^2 has an amino acid substituted, 
deleted or inserted adjacent to any one of residues 1-23, 107-108, 121-123, 128-130 

2D and 163-244 (Fig. 15). 

24. The composition of claim 16 wherein the heregulin is HRG-^3- 

25. The composition of claim 24 wherein the heregulin Pa has an amino acid 

25 substituted, deleted or inserted adjacent to any one of residues 1-23, 107-108, 121-123, 

128-130 and 163-241 (Fig. 15). 

26. An isolated antibody that is capable of binding a heregulin polypeptide. 

30 27. The isolated antibody of claim 26 that is capable of binding specifically to a heregulin- 
oc heregulin-pi, heregulin-P2. orheregulin-33. 

26. Isolated heregulin encodhg nucleic add. 

35 29. The nucleic acid of claim 28 which encodes heregulin-a, heregulin-p 1 , heregulin-p2 . or 
heregurin-p3 polypeptide. 



30. 



The nucleic acid of claim 28 that encodes a heregulin-GFO. 
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31 . An expression vector comprising the nucleic acid of daim 28. 

32. The expression vector of claim 31 wherein the nucleic acid encodes a heregulin*GFD. 

5 33. A host cell transformed with a vector of claim 31 . 

34. A method comprising cuHuring the host cell of claim 33 to express the heregulin and 
recovering the heregulin from the host cell. 

10 35. The method of claim 34 wherein the heregulin is heregulin-o, heregulin-p 1 , heregulin 
p2,or heregulin-p3. 

36. The method of claim 34 wherein the heregulin is heregulin-NTD-GFD. 
15 37. The method of claim 34 wherein the heregulin is heregulin-GFD. 

38. A method of determining the presence of a heregulin nucleic acid, comprising 
contacting the nucleic acid of claim 28 with a test sample nucleic acid and determining 
whether hybridization has occun^ed. 

2D 

39. A method of amplifying a nucleic acid test sample comprising priming a nucleic acid 
polymerase chain reaction with the nucleic acid of claim 28. 



40. 

25 



A method for purifying a heregulin comprising adsorbing heregulin from a contaminated 
solution thereof onto heparin Sepharose or a cation exchange resin. 
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FIG. 3 
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A /32 

GG GCG CGA GCG CCT CAG CGC GGC CGC TCG CTC TCC CCC 38 
Ala Arg Ala Pro Gin Arg Gly Arg Ser Leu Ser Pro 
15 10 

TCG AGG GAC AAA CTT TTC CCA AAC CCG ATC CGA GCC CTT 77 
Ser Arg Asp Lys Leu Phe Pro Asn Pro lie Arg Ala Leu 
15 20 25 

GGA CCA AAC TCG CCT GCG CCG AGA GCC GTC CGC GTA GAG 116 
Gly Pro Asn Ser Pro Ala Pro Arg Ala Val Arg Val Glu 

30 35 

CGC TCC GTC TCC GGC GAG ATG TCC GAG CGC AAA GAA GGC 155 
Arg Ser Val Ser Gly Glu Met Ser Glu Arg Lys Glu Gly 
40 45 50 

AGA GGC AAA GGG AAG GGC AAG AAG AAG GAG CGA GGC TCC 194 
Arg Gly Lys Gly Lys Gly Lys Lys Lys Glu Arg Gly Ser 
55 60 

GGC AAG AAG CCG GAG TCC GCG GCG GGC AGC CAG AGC CCA 233 
Gly Lys Lys Pro Glu Ser Ala Ala Gly Ser Gin Ser Pro 
65 70 75 

GCC TTG CCT CCC CGA TTG AAA GAG ATG AAA AGC CAG GAA 272 
Ala Leu Pro Pro Arg Leu Lys Glu Met Lys Ser Gin Glu 
80 85 90 

TCG GCT GCA GGT TCC AAA CTA GTC CTT CGG TGT GAA ACC 311 
Ser Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr 

95 100 

AGT TCT GAA TAC TCC TCT CTC AGA TTC AAG TGG TTC AAG 350 
Ser Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys 
105 110 115 

AAT GGG AAT GAA TTG AAT CGA AAA AAC AAA CCA CAA AAT 389 
Asn Gly Asn Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn 
120 125 

ATC AAG ATA CAA AAA AAG CCA GGG AAG TCA GAA CTT CGC 42 B 
lie Lys lie Gin Lys Lys Pro Gly Lys Ser Glu Leu Aro 
130 135 140 

ATT AAC AAA GCA TCA CTG GCT GAT TCT GGA GAG TAT ATG 467 
He Asn Lys Ala Ser Leu Ala Asp Ser Gly Glu Tyr Met 
145 150 155 



TGC AAA GTG ATC AGC AAA TTA GGA AAT GAC AGT GCC TCT 50:' 
C>'s Lys Val He Ser Lys Leu Gly Asn Asp Ser Ala Ser 

FIG. 4A "° 
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GCC AAT ATC ACC ATC GTG GAA TCA AAC GAG ATC ATC ACT 545 
Ala Asn lie Thr lie Val Glu Ser Asn Glu lie lie Thr 
170 175 180 

GGT ATG CCA GCC TCA ACT GAA GGA GCA TAT GTG TCT TCA 584 
Gly Met Pro Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser 
185 190 

GAG TCT CCC ATT AGA ATA TCA GTA TCC ACA GAA GGA GCA 623 
Glu Ser Pro lie Arg lie Ser Val Ser Thr Glu Gly Ala 
195 200 205 

AAT ACT TCT TCA TCT ACA TCT ACA TCC ACC ACT GGG ACA 662 
Asn Thr Ser Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr 
210 215 220 

AGC CAT CTT GTA AAA TGT GCG GAG AAG GAG AAA ACT TTC 701 
Ser His Leu Val Lys Cys Ala Glu Lys Glu Lys Thr Phe 

225 230 

TGT GTG AAT GGA GGG GAG TGC TTC ATG GTG AAA GAC CTT 740 
Cys Val Asn Gly Gly Glu Cys Phe Met Val Lys Asp Leu 
235 240 245 

TCA AAC CCC TCG AGA TAC TTG TGC AAG TGC CAA CCT GGA 779 
Ser Asn Pro Ser Arg Tyr Leu Cys Lys Cys Gin Pro Gly 
250 255 

TTC ACT GGA GCA AGA TGT ACT GAG AAT GTG CCC ATG AAA 818 
Phe Thr Gly Ala Arg Cys Thr Glu Asn Val Pro Met Lys 
260 265 270 

GTG CAA AAC CAA GAA AAG GCG GAG GAG CTG TAC CAG AAG 857 
Val Gin Asn Gin Glu Lys Ala Glu Glu Leu Tyr Gin Lys 
275 280 285 

AGA GTG CTG ACC ATA ACC GGC ATC TGC ATC GCC CTC CTT 896 
Arg Val Leu Thr lie Thr Gly lie Cys lie Ala Leu Leu 

290 295 

GTG GTC GGC ATC ATG TGT GTG GTG GCC TAC TGC AAA ACC 93 5 
Val Val Gly He Met Cys Val Val Ala Tyr Cys Lys Thr 
300 305 310 

AAG A.^ CAG CGG AAA AAG CTG CAT GAC CGT CTT CGG CAG 97 4 
Lvs Lvs Gin Arg Lys Lys Leu His Asp Arg Leu Arg Gin 
315 320 

AGC CTT CGG TCT GAA CGA AAC AAT ATG ATG AAC ATT GCC lOi: 
Ser Leu Aro Ser Glu Aro Asn Asn Met Met Asn He Ala 
325 ■ 330 335 f\Q 4g 

SUBSTITUTE SHEET 
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AAT GGG CCT CAC CAT CCT AAC CCA CCC CCC GAG AAT GTC 1052 
Asn Gly Pro His His Pro Asn Pro Pro Pro Glu Asn Val 
340 345 350 

CAG CTG GTG AAT CAA TAC GTA TCT AAA AAC GTC ATC TCC 1091 
Gin Leu Val Asn Gin Tyr Val Ser Lys Asn Val lie Ser 

355 360 

AGT GAG CAT ATT GTT GAG AGA GAA GCA GAG AC A TCC TTT 1130 
Ser Glu His lie Val Glu Arg Glu Ala Glu Thr Ser Phe 
365 370 375 

TCC ACC AGT CAC TAT ACT TCC ACA GCC CAT CAC TCC ACT 1169 
Ser Thr Ser His Tyr Thr Ser Thr Ala His His Ser Thr 
380 385 

ACT GTC ACC CAG ACT CCT AGC CAC AGC TGG AGC AAC GGA 1208 
Thr Val Thr Gin Thr Pro Ser His Ser Trp Ser Asn Gly 
390 395 400 

CAC ACT GAA AGC ATC CTT TCC GAA AGC CAC TCT GTA ATC 1247 
His Thr Glu Ser lie Leu Ser Glu Ser His Ser Val lie 
405 410 415 

GTG ATG TCA TCC GTA GAA AAC AGT AGG CAC AGC AGC CCA 1286 
Val Met Ser Ser Val Glu Asn Ser Arg His Ser Ser Pro 

420 425 

ACT GGG GGC CCA AGA GGA CGT CTT AAT GGC ACA GGA GGC 1325 
Thr Gly Gly Pro Arg Gly Arg Leu Asn Gly Thr Gly Gly 
430 435 440 

CCT CGT GAA TGT AAC AGC TTC CTC AGG CAT GCC AGA GAA 1364 
Pro Arg Glu Cys Asn Ser Phe Leu Arg His Ala Arg Glu 
445 450 

ACC CCT GAT TCC TAC CGA GAC TCT CCT CAT AGT GAA AGG 1403 
Thr Pro Asp Ser Tyr Arg Asp Ser Pro His Ser Glu Arg 
455 460 465 

TAT GTG TCA GCC ATG ACC ACC CCG GCT CGT ATG TCA CCT 1442 
Ti'r Val Ser Ala Met Thr Thr Pro Ala Arg Met Ser Pro 
470 475 480 

GTA GAT TTC CAC ACG CCA AGC TCC CCC AAA TCG CCC CCT 14 61 
Val Aso Phe His Thr Pro Ser Ser Pro Lys Ser Pro Pro 

485 490 

TCG GA;-- ATG TCT CCA CCZ GTG TCC AGC ATG ACG GTG TCC 152C 
Ser Glu Met Ser Pre Pre Val Ser Ser Met Thr Val Ser 

495 50C 505 f\Q 4Q 
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ATG CCT TCC ATG GCG GTC AGC CCC TTC ATG GAA GAA GAG 1559 
Met Pro Ser Met Ala Val Ser Pro Phe Met Glu Glu Glu 
510 515 

AG A CCT CTA CTT CTC GTG ACA CCA CCA AGG CTG CGG GAG 1598 
Arg Pro Leu Leu Leu Val Thr Pro Pro Arg Leu Arg Glu 
520 525 530 

AAG AAG TTT GAC CAT CAC CCT CAG CAG TTC AGC TCC TTC 1637 
Lys Lys Phe Asp His His Pro Gin Gin Phe Ser Ser Phe 
535 540 545 

CAC CAC AAC CCC GCG CAT GAC AGT AAC AGC CTC CCT GCT 1676 
His His Asn Pro Ala His Asp Ser Asn Ser Leu Pro Ala 

550 555 

AGC CCC TTG AGG ATA GTG GAG GAT GAG GAG TAT GAA ACG 1715 
Ser Pro Leu Arg lie Val Glu Asp Glu Glu Tyr Glu Thr 
560 565 570 

ACC CAA GAG TAC GAG CCA GCC CAA GAG CCT GTT AAG AAA 1754 
Thr Gin Glu Tyr Glu Pro Ala Gin Glu Pro Val Lys Lys 
575 580 

CTC GCC AAT AGC CGG CGG GCC AAA AGA ACC AAG CCC AAT 1793 
Leu Ala Asn Ser Arg Arg Ala Lys Arg Thr Lys Pro Asn 
585 590 595 

GGC CAC ATT GCT AAC AGA TTG GAA GTG GAC AGC AAC ACA 1832 
Gly His lie Ala Asn Arg Leu Glu Val Asp Ser Asn Thr 
600 605 610 

AGC TCC CAG AGC AGT AAC TCA GAG AGT GAA ACA GAA GAT 1871 
Ser Ser Gin Ser Ser Asn Ser Glu Ser Glu Thr Glu Asp 

615 620 

GAA AGA GTA GGT GAA GAT ACG CCT TTC CTG GGC ATA CAG 19 IC 
Glu Arg Val Giv Glu Asp Thr Pro Phe Leu Gly lie Gin 
625 ' 630 635 

AAC CCC CTG GCA GCC AGT CTT GAG GCA ACA CCT GCC TTC 1949 
Asn Pro Leu Ala Ala Ser Leu Glu Ala Thr Pro Ala Phe 
640 645 

CGC CTG GCT GAC AGC AGG ACT AAC CCA GCA GGC CGC TTC 1986 
Arg Leu Ala Asp Ser Arg Thr Ash Pro Ala Gly Arg Phe 
650 * 655 660 

TC3 AC.-. C.2-.G GA.-. GA.-. ATC CAG G 2010 

ser Thr Gin Glu Giu lie Gin f^Q 4Q 

151 incarcri jTc e^ui^f— *> 
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CELL GROWTH STIMULATION BY HEREGULIN 2-alpha 




CONTROL SKBR-3 MCF-7 MB-468 



FIG. 7 
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GG GAC AAA CTT TTC CCA A^(?^CG ATC CGA GCC CTT GGA 38 

Asp Lys Leu Phe Pro Asn Pro lie Arg Ala Leu Gly 
15 10 

CCA AAC TCG CCT GCG CCG AGA GCC GTC CGC GTA GAG CGC 77 
Pro Asn Ser Pro Ala Pro Arg Ala Val Arg Val Glu Arg 
15 20 25 

TCC GTC TCC GGC GAG ATG TCC GAG CGC AAA GAA GGC AGA 116 

Ser Val Ser Gly Glu Met Ser Glu Arg Lys Glu Gly Arg 

30 35 

GGC AAA GGG AAG GGC AAG AAG AAG GAG CGA GGC TCC GGC 155 

Gly Lys Gly Lys Gly Lys Lys Lys Glu Arg Gly Ser Gly 
40 45 50 

AAG AAG CCG GAG TCC GCG GCG GGC AGC CAG AGC CCA GCC 194 

Lys Lys Pro Glu Ser Ala Ala Gly Ser Gin Ser Pro Ala 
55 60 

TTG CCT CCC CAA TTG AAA GAG ATG AAA AGC CAG GAA TCG 233 

Leu Pro Pro Gin Leu Lys Glu Met Lys Ser Gin Glu Ser 
65 70 75 

GCT GCA GGT TCC AAA CTA GTC CTT CGG TGT GAA ACC AGT 272 

Ala Ala Gly Ser Lys Leu Val Leu Arg Cys Glu Thr Ser 
80 85 90 

TCT GAA TAC TCC TCT CTC AGA TTC AAG TGG TTC AAG AAT 311 

Ser Glu Tyr Ser Ser Leu Arg Phe Lys Trp Phe Lys Asn 

95 100 

GGG AAT GAA TTG AAT CGA AAA AAC AAA CCA CAA AAT ATC 350 

Gly Asn Glu Leu Asn Arg Lys Asn Lys Pro Gin Asn lie 
105 110 115 

AAG ATA CAA AAA AAG CCA GGG AAG TCA GAA CTT CGC ATT 389 

Lys lie Gin Lys Lys Pro Gly Lys Ser Glu Leu Arg lie 
120 125 

AAC AAA GCA TCA CTG GCT GAT TCT GGA GAG TAT ATG TGC 428 

Asn Lys Ala Ser Leu Ala Asp Ser Gly Glu Tyr Met Cys 

130 135 140 

AAA GTG ATC AGC AAA TTA GGA AAT GAC AGT GCC TCT GCC 467 

Lys Val lie Ser Lvs Leu Gly Asn Asp Ser Ala Ser Ala 
145 ' 150 155 

P^.T ATC ACC ATC GTG GAA TCA AAC GAG ATC ATC ACT GGT 506 

Asn lie Thr lie Val Glu Ser Asn Glu lie lie Thr Gly 

160 165 



FIG. 8A SUBSTITUTE SHEET 
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ATG CCA GCC TCA ACT 
Met Pro Ala Ser Thr 
170 

TCT CCC ATT AGA ATA 
Ser Pro lie Arg lie 
185 

ACT TCT TCA TCT ACA 
Thr Ser Ser Ser Thr 
195 

CAT CTT GTA AAA TGT 
His Leu Val Lys Cys 
210 

GTG AAT GGA GGG GAG 
Val Asn Gly Gly Glu 

225 

AAC CCC TCG AGA TAC 
Asn Pro Ser Arg Tyr 
235 

ACT GGT GAT CGC TGC 
Thr Gly Asp Arg Cys 
250 

TAC AAG CAT CTT GGG 
Tyr Lys His Leu Gly 
260 

CTG TAC CAG AAG AGA 
Leu Tyr Gin Lys Arg 
275 

ATC GCC CTC CTT GTG 
lie Ala Leu Leu Val 

290 

TAC TGC AAA ACC AAG 
Tyr Cys Lys Thr Lys 
300 

CGT CTT CGG CAG AGC 
Arg Leu Arg Gin Ser 
315 

ATG AAC ATT GCC AAT 
Me- Asn lie Ala Asn 

^ -J 



GAA^6;?^CA TAT GTG 

Glu Gly Ala Tyr Val 
175 

TCA GTA TCC ACA GAA 
Ser Val Ser Thr Glu 
190 

TCT ACA TCC ACC ACT 
Ser Thr Ser Thr Thr 

200 

GCG GAG AAG GAG AAA 
Ala Glu Lys Glu Lys 
215 

TGC TTC ATG GTG AAA 
Cys Phe Met Val Lys 

230 

TTG TGC AAG TGC CCA 
Leu Cys Lys Cys Pro 
240 

CAA AAC TAC GTA ATG 
Gin Asn Tyr Val Met 
255 

ATT GAA TTT ATG GAG 
He Glu Phe Met Glu 
265 

GTG CTG ACC ATA ACC 
Val Leu Thr He Thr 
280 

GTC GGC ATC ATG TGT 
Val Gly He Met Cys 

295 

AAA CAG CGG AAA AAG 
Lys Gin Arg Lys Lys 
305 

CTT CGG TCT GAA CGA 
Leu Arg Ser Glu Arg 
320 

GGG CCT CAC CAT CCT 
Glv Pro His His Pro 
330 

FIG. 8B 



TCT TCA GAG 545 
Ser Ser Glu 
180 

GGA GCA AAT 584 
Gly Ala Asn 



GGG ACA AGC 623 
Gly Thr Ser 
205 

ACT TTC TGT 662 

Thr Phe ""ys 
:'20 

GAC CTT TCA 701 
Asp Leu Ser 



AAT GAG TTT 740 
Asn Glu Phe 
245 

GCC AGC TTC 779 
Ala Ser Phe 



GCG GAG GAG 818 
Ala Glu Glu 
270 

GGC ATC TGC 857 
Gly He Cys 
285 

GTG GTG GCC 896 
Val Val Ala 



CTG CAT GAC 93 5 
Leu His Asp 
310 

AAC AAT ATG 974 
Asn Asn Met 



AAC CCA CCC 1013 
Asn Pro Pro 
335 

SUBSTITUTE SHEET 
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CCC GAG AAT GTC CAG CTG J^C?^T CAA TAG GTA TCT AAA 1052 
Pro Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys 
340 345 350 

AAC GTC ATC TCC AGT GAG CAT ATT GTT GAG AGA GAA GCA 1091 
Asn Val lie Ser Ser Glu His He Val Glu Arg Glu Ala 

355 360 

GAG ACA TCC TTT TCC ACC AGT CAC TAT ACT TCC ACA GCC 1130 
Glu Thr Ser Phe Ser Thr Ser His Tyr Thr Ser Thr Ala 
365 370 375 

CAT CAC TCC ACT ACT GTC ACC CAG ACT CCT AGC CAC AGC 1169 
His His Ser Thr Thr Val Thr Gin Thr Pro Ser His Ser 
380 385 

TGG AGC AAC GGA CAC ACT GAA AGC ATC CTT TCC GAA AGC 1208 
Trp Ser Asn Gly His Thr Glu Ser He Leu Ser Glu Ser 
390 395 400 

CAC TCT GTA ATC GTG ATG TCA TCC GTA GAA AAC AGT AGG 1247 
His Ser Val He Val Met Ser Ser Val Glu Asn Ser Arg 
405 410 415 

CAC AGC AGC CCA ACT GGG GGC CCA AGA GGA CGT CTT AAT 1286 
His Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn 

420 425 

GGC ACA GGA GGC CCT CGT GAA TGT AAC AGC TTC CTC AGG 1325 
Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg 
430 435 440 

CAT GCC AGA GAA ACC CCT GAT TCC TAC CGA GAC TCT CCT 13 64 
His Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro 
445 450 

CAT AGT GAA AGG TAT GTG TCA GCC ATG ACC ACC CCG GCT 1403 
His Ser Glu Arg Tyr Val Ser Ala Met Thr Thr Pro Ala 
455 460 465 

CGT ATG TCA CCT GTA GAT TTC CAC ACG CCA AGC TCC CCC 1442 
Arg Met Ser Pro Val Asp Phe His Thr Pro Ser Ser Pro 
470 475 480 

AAA TCG CCC CCT TCG GAA ATG TCT CCA CCC GTG "TCC AGC 1481 
Lys Ser Pro Pro Ser Glu Met Ser Pro Pro Val Ser Ser 

485 490 

ATG ACG GTG TCC ATG CCT TCC ATG GCG GTC AGC CCC TTC 1520 
Met Thr Val Ser Met Pro Ser Met Ala Val Ser Pro Phe 
495 500 505 
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ATG GAA GAA GAG AGA CCT CTA CTT CTC GTG ACA CCA CCA 1559 
Met Glu Glu Glu Arg Pro Leu Leu Leu Val Thr Pro Pro 
510 515 

AGG CTG CGG GAG AAG AAG TTT GAC CAT CAC CCT CAG CAG 1598 
Arg Leu Arg Glu Lys Lys Phe Asp His His Pro Gin Gin 
520 525 530 

TTC AGC TCC TTC CAC CAC AAC CCC GCG CAT GAC AGT AAC 1637 
Phe Ser Ser Phe His His Asn Pro Ala His Asp Ser Asn 
535 540 545 

AGC CTC CCT GCT AGC CCC TTG AGG ATA GTG GAG GAT GAG 1676 
Ser Leu Pro Ala Ser Pro Leu Arg lie Val Glu Asp Glu 

550 555 

GAG TAT GAA ACG ACC CAA GAG TAC GAG CCA GCC CAA GAG 1715 
Glu Tyr Glu Thr Thr Gin Glu Tyr Glu Pro Ala Gin Glu 
560 565 570 

CCT GTT AAG AAA CTC GCC AAT AGC CGG CGG GCC AAA AGA 1754 
Pro Val Lys Lys Leu Ala Asn Ser Arg Arg Ala Lys Arg 
575 580 

ACC AAG CCC AAT GGC CAC ATT GCT AAC AGA TTG GAA GTG 1793 
Thr Lys Pro Asn Gly His lie Ala Asn Arg Leu Glu Val 
585 590 595 

GAC AGC AAC ACA AGC TCC CAG AGC AGT AAC TCA GAG AGT 1832 
Asp Ser Asn Thr Ser Ser Gin Ser Ser Asn Ser Glu Ser 
600 605 610 

GAA ACA GAA GAT GAA AGA GTA GGT GAA GAT ACG CCT TTC 1871 
Glu Thr Glu Asp Glu Arg Val Gly Glu Asp Thr Pro Phe 

615 620 

CTG GGC ATA CAG AAC CCC CTG GCA GCC AGT CTT GAG GCA 1910 
Leu Gly lie Gin Asn Pro Leu Ala Ala Ser Leu Glu Ala 
625 630 635 

ACA CCT GCC TTC CGC CTG GCT GAC AGC AGG ACT AAC CCA 1949 
Thr Pro Ala Phe Arg Leu Ala Asp Ser Arg Thr Asn Pro 
640 645 

GCA GGC , CGC TTC TCG ACA CAG GAA GAA ATC CAG GCC AGG 1986 
Ala Gly Arg Phe Ser Thr Gin Glu Glu lie Gin Ala Arg 
650 655 660 

CTG TCT AGT GTA ATT GCT AAC CAA GAC CCT ATT GCT GTA TA 20 
Leu Ser Ser Val lie Ala Asn Gin Asp Pro lie Ala Val 
665 670 675 
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STIMULATION OF HER2 AUTOPHOSPHORYLATION 

200 I 1 1 1 1 1 




60 I ' ' " ' 1 

10-^ 10'^ 10"^ 10° 10^ 10^ 

HRG2 (7K) [nM] 

FIG. 10 
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AA AGA GCC GGC GAG GAG Wc (^C GAA ACT TGT TGG AAC 38 
Arg Ala Gly Glu Glu Phe Pro Glu Thr Cys Trp Aan 
15 10 

TCC GGG CTC GCG CGG AGG CCA GGA 6CT GAG CGG CGG CGG 77 
Ser Gly Leu Ala Arg Arg Pro Gly Ala Glu Arg Arg Arg 
15 20 25 

CTG CCG GAC GAT GGG AGC GTG AGC AGG ACG GTG ATA ACC 116 
Leu Pro Asp Asp Gly Ser Val Ser Arg Thr Val lie Thr 

30 35 

TCT CCC CGA TCG GGT TGC GAG GGC GCC GGG CAG AGG CCA 155 
Ser Pro Arg Ser Gly Cys Glu Gly Ala Gly Gin Arg Pro 
40 45 50 

GGA CGC GAG CCG CCA GCG GTG GGA CCC ATC GAC GAC TTC 194 
Gly Arg Glu Pro Pro Ala Val Gly Pro lie Asp Asp Phe 
55 60 

CCG GGG CGA CAG GAG CAG CCC CGA GAG CCA GGG CGA GCG 233 
Pro Gly Arg Gin Glu Gin Pro Arg Glu Pro Gly Arg Ala 
65 70 75 

CCC GTT CCA GGT GGC CGG ACC GCC CGC CGC GTC CGC GCC 272 
Pro Val Pro Gly Gly Arg Thr Ala Arg Arg Val Arg Ala 
80 85 90 

GCG CTC CCT GCA GGC AAC GGG AGA CGC CCC CGC GCA GCG 311 
Ala Leu Pro Ala Gly Asn Gly Arg Arg Pro Arg Ala Ala 

95 100 

CGA GCG CCT CAG CGC GGC CGC TCG CTC TCC CCC TCG AGG 350 
Arg Ala Pro Gin Arg Gly Arg Ser Leu Ser Pro Ser Arg 
105 110 115 

GAC AAA CTT TTC CCA AAC CCG ATC CGA GCC CTT GGA CCA 389 
Asp Lys Leu Phe Pro Asn Pro lie Arg Ala Leu Gly Pro 
120 125 

AAC TCG CCT GCG CCG AGA GCC GTC CGC GTA GAG CGC TCC 428 
Asn Ser Pro Ala Pro Arg Ala Val Arg Val Glu Arg Ser 
130 135 140 

GTC TCC GGC GAG ATG TCC GAG CGC AAA GAA GGC AGA GGC 467 
Val Ser Gly Glu Met Ser Glu Arg Lys Glu Gly Arg Gly 
145 150 ' 155 

AAA GGG AAG GGC AAG AAG AAG GAG CGA GG 496 
Lys Gly Lys Gly Lys Lys Lys Glu Arg 

160 164 
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GTGGCTGCGG GGCAATTGAA AAAGAGCCGG CGAGGAGTTC CCCGAAACTT 50 



GTTGGAACTC CGGGCTCGCG CGGAGGCCAG GAGCTGAGCG GCGGCGGCTG 100 
CCGGACGATG GGAGCGTGAG CAGGACGGTG ATAACCTCTC CCCGATCGGG 150 
TTGCGAGGGC GCCGG6CAGA GGCCAGGACG CGAGCCGCCA GCGGCGGGAC 200 
CCATCGACGA CTTCCCGGGG CGACAGGAGC AGCCCCGAGA GCCAGGGCGA 250 
GCGCCCGTTC CAGGTGGCCG GACCGCCCGC CGCGTCCGCG CCGCGCTCCC 300 
TGCAGGCAAC GGGAGACGCC CCCGCGCAGC GCGAGCGCCT CAGCGCGGCC 350 
GCTCGCTCTC CCCATCGAGG GACAAACTTT TCCCAAACCC GATCCGAGCC 400 
CTTGGACCAA ACTCGCCTGC GCCGAGAGCC GTCCGCGTAG AGCGCTCCGT 450 



CTCCGGCGAG ATG TCC GAG CGC AAA GAA GGC AGA GGC AAA 490 
Met Ser Glu Arg Lys Glu Gly Arg Gly Lys 
15 10 

GGS AAG GGC AAG AAG AAG GAG CGA GGC TCC GGC AAG AAG 529 
Gly Lys Gly Lys Lys Lys Glu Arg Gly Ser Gly Lys Lys 

15 20 

CCG GAG TCC GCG GCG GGC AGC CAG AGC CCA GCC TTG OCT 568 
Pro Glu Ser Ala Ala Gly Ser Gin Ser Pro Ala Leu Pro 
25 30 35 

CCC CAA TTG AAA GAG ATG AAA AGC CAG GAA TCG GCT GCA 607 
Pro Gin Leu Lys Glu Met Lys Ser Gin Glu Ser Ala Ala 
40 45 

GGT TCC AAA CTA GTC CTT CGG TGT GAA ACC AGT TCT GAA 64 6 
Glv Ser Lvs Leu Val Leu Arg Cys Glu Thr Ser Ser Glu 

50 ' 55 60 

TAC TCC TCT CTC AGA TTC AAG TGG TTC AAG AAT GGG AAT 685 
Tyr Ser Ser Leu Arg Phe Lvs Trp Phe Lys Asn Gly Asn 
65 '70 75 

GAA TTG AAT CGA AAA AAC AAA CCA CAA AAT ATC AAG ATA 724 
Glu Leu Asn Arg Lvs Asn Lvs Pro Gin Asn lie Lys lie 

FIG.I2A ' „ 
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CAA AAA AAG CCA GGG AA^ O't^^AA CTT CGC ATT AAC AAA 763 
Gin Lys Lys Pro Gly Lys Ser Glu Leu Arg He Asn Lys 
90 95 100 

GCA TCA CTG GCT GAT TCT GGA GAG TAT ATG TGC AAA GTG 802 
Ala Ser Leu Ala Asp Ser Gly Glu Tyr Met Cys Lys Val 
105 110 

ATC AGC AAA TTA GGA AAT GAC AGT GCC TCT GCC AAT ATC 841 
He Ser Lys Leu Gly Asn Asp Ser Ala Ser Ala Asn He 
115 120 125 

ACC ATC GTG GAA TCA AAC GAG ATC ATC ACT GGT ATG CCA 880 
Thr He Val Glu Ser Asn Glu He He Thr Gly Met Pro 
130 135 140 

GCC TCA ACT GAA GGA GCA TAT GTG TCT TCA GAG TCT CCC 919 
Ala Ser Thr Glu Gly Ala Tyr Val Ser Ser Glu Ser Pro 

145 150 

ATT AGA ATA TCA GTA TCC ACA GAA GGA GCA AAT ACT TCT 958 
He Arg He Ser Val Ser Thr Glu Gly Ala Asn Thr Ser 
155 160 165 

TCA TCT ACA TCT ACA TCC ACC ACT GGG ACA AGC CAT CTT 997 
Ser Ser Thr Ser Thr Ser Thr Thr Gly Thr Ser His Leu 
170 175 

GTA AAA TGT GCG GAG AAG GAG AAA ACT TTC TGT GTG AAT 1036 
Val Lys Cys Ala Glu Lys Glu Lys Thr Phe Cys Val Asn 
180 185 190 

GGA GGG GAG TGC TTC ATG GTG AAA GAC CTT TCA AAC CCC 1075 
Gly Gly Glu Cys Phe Met Val Lys Asp Leu Ser Asn Pro 
195 200 205 

TCG AGA TAC TTG TGC AAG TGC CCA AAT GAG TTT ACT GGT 1114 
Ser Arg Tyr Leu Cys Lys Cys Pro Asn Glu Phe Thr Gly 

210 215 

GAT CGC TGC CAA AAC TAC GTA ATG GCC AGC TTC TAC AAG 1153 
Asp Arg Cys Gin Asn Tyr Val Met Ala Ser Phe Tyr Lys 
220 225 230 

GCG GAG GAG CTG TAC CAG AAG AGA GTG CTG ACC ATA ACC 1192 
Ala Glu Glu Leu Tyr Gin Lys Arg Val Leu Thr He Thr 
235 240 

GGC ATC TGC ATC GCC CTC CTT GTG GTC GGC ATC ATG TGT 1231 
Gly He Cys He Ala Leu Leu Val Val Gly He Met Cys 
245 250 255 

GTG GTG GCC TAC TGC AAA ACC AAG AAA CAG CGG AAA AAG 1270 
Val Val Ala Tyr Cys Lys Thr Lys Lys Gin Arg Lys Lys 

FIG. I2B ^" 
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CTG CAT 6AC CGT CTT CGG (S^^ CTT CGG TCT GAA CGA 1309 

Leu His Asp Arg Leu Arg Gin Ser Leu Arg Ser Glu Arg 

275 280 

AAC AAT ATG ATG AAC ATT GCC AAT GGG CCT CAC CAT CCT 1348 
Asn Asn Met Met Asn He Ala Asn Gly Pro His His Pro 
285 290 295 

AAC CCA CCC CCC GAG AAT 6TC CA6 CTG GTG AAT CAA TAC 1387 
Asn Pro Pro Pro Glu Asn Val Gin Leu Val Asn Gin Tyr 
300 305 

GTA TCT AAA AAC GTC ATC TCC AGT GAG CAT ATT GTT GAG 1426 
Val Ser Lys Asn Val He Ser Ser Glu His He Val Glu 
310 315 320 

AGA GAA GCA GAG ACA TCC TTT TCC ACC AGT CAC TAT ACT 1465 
Arg Glu Ala Glu Thr Ser Phe Ser Thr Ser His Tyr Thr 
325 330 335 

TCC ACA GCC CAT CAC TCC ACT ACT GTC ACC CAG ACT CCT 1504 
Ser Thr Ala His His Ser Thr Thr Val Thr Gin Thr Pro 

340 345 

AGO CAC AGO TGG AGC AAC GGA CAC ACT GAA AGC ATC CTT 1543 
Ser His Ser Trp Ser Asn Gly His Thr Glu Ser He Leu 
350 355 360 

TCC GAA AGC CAC TCT GTA ATC GTG ATG TCA TCC GTA GAA 1582 
Ser Glu Ser His Ser Val He Val Met Ser Ser Val Glu 
365 370 

AAC AGT AGO CAC AGC AGC CCA ACT GGG GGC CCA AGA GGA 1621 
Asn Ser Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly 
375 380 385 

CGT CTT AAT GGC ACA GGA GGC CCT CGT GAA TGT AAC AGC 1660 
Arg Leu Asn Gly Thr Gly Gly Pro Arg Glu Cys Asn Ser 
390 395 400 

TTC CTC AGO CAT GCC AGA GAA ACC CCT GAT TCC TAC CGA 1699 
Phe Leu Arg His Ala Arg Glu Thr Pro Asp Ser Tyr Arg 

405 410 

GAC TCT CCT CAT AGT GAA AGG TAT GTG TCA GCC ATG ACC 1738 
Asp Ser Pro His Ser Glu Arg Tyr Val Ser Ala Met Thr 
415 420 425 

ACC CCG GCT CGT ATG TCA CCT GTA GAT TTC CAC ACG CCA 1777 
Thr Pro Ala Arg Met Ser Pro Val Asp Phe His Thr Pro 
430 435 

AGC TCC CCC AAA TCG CCC CCT TCG GAA ATG TCT CCA CCC 1816 
Ser Ser Pro Lys Ser Pro Pro Ser Glu Met Ser Pro Pro 
440 445 450 p|Q jgQ 
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GTG TCC AGC ATG ACG GTG ^CC AA^ CCT TCC ATG GCG GTC 1855 
Val Ser Ser Met Thr Val Ser Lys Pro Ser Met Ala Val 
455 460 465 

AGC CCC TTC ATG GAA GAA GAG AGA CCT CTA CTT CTC GTG 1894 
Ser Pro Phe Met Glu Glu Glu Arg Pro Leu Leu Leu Val 

470 475 

ACA CCA CCA AGG CTG CGG GAG AAG AAG TTT GAC CAT CAC 1933 
Thr Pro Pro Arg Leu Arg Glu Lys Lys Phe Asp His His 
480 485 490 

CCT CAG CAG TTC AGC TCC TTC CAC CAC AAC CCC GCG CAT 1972 
Pro Gin Gin Phe Ser Ser Phe His His Asn Pro Ala His 
495 500 

GAC AGT AAC AGC CTC CCT GCT AGC CCC TTG AGG ATA GTG 2011 
Asp Ser Asn Ser Leu Pro Ala Ser Pro Leu Arg lie Val 
505 510 515 

GAG GAT GAG GAG TAT GAA ACG ACC CAA GAG TAC GAG CCA 2050 
Glu Asp Glu Glu Tyr Glu Thr Thr Gin Glu Tyr Glu Pro 
520 525 530 

GCC CAA GAG CCT GTT AAG AAA CTC GCC AAT AGC CGG CGG 2089 
Ala Gin Glu Pro Val Lys Lys Leu Ala Asn Ser Arg Arg 

535 540 

GCC AAA AGA ACC AAG CCC AAT GGC CAC ATT GCT AAC AGA 2128 
Ala Lys Arg Thr Lys Pro Asn Gly His He Ala Asn Arg 
545 550 555 

TTG GAA GTG GAC AGC AAC ACA AGC TCC CAG AGC AGT AAC 2167 
Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser Ser Asn 
560 565 

TCA GAG AGT GAA ACA GAA GAT GAA AGA GTA GGT GAA GAT 2206 
Ser Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp 
570 575 580 

ACG CCT TTC CTG GGC ATA CAG AAC CCC CTG GCA GCC AGT 2245 
Thr Pro Phe Leu Gly He Gin Asn Pro Leu Ala Ala Ser 
585 590 595 

CTT GAG GCA ACA CCT GCC TTC CGC CTG GCT GAC AGC AGG 2284 
Leu Glu Ala Thr Pro Ala Phe Arg Leu Ala Asp Ser Arg 

600 605 

ACT AAC CCA GCA GGC CGC TTC TCG ACA CAG GAA GAA ATC 2323 
Thr Asn Pro Ala Gly Arg Phe Ser Thr Gin Glu Glu He 
610 615 620 

CAG GCC AGG CTG TCT AGT GTA ATT GCT AAC CAA GAC CCT 2362 
Gin Ala Arg Leu Ser Ser Val He Ala Asn Gin Asp Pro 

FIG. I2D 
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ATT GCT GTA TAAAACCTA AATAAACACA TAGATTCACC TGTAAAACTT 2410 
He Ala Val 
€35 637 

TATTTTATAT AATAAAGTAT TCCACCTTAA ATTAAACAAT TTATTTTATT 2460 



TTAGCAGTTC TGCAAATAAA AAAAAAAAAA 2490 



FIG. I2E 
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GCGCCTGCCT CCAACCTGCG GGCGGGAGGT GGGTGGCTGC GGGGCAATTG 50 



AAAAAGAGCC GGCGAGGAGT TCCCCGAAAC TTGTTGGAAC TCCGGGCTCG 100 
CGCGGAGGCC AGGAGCTGAG CGGCGGCGGC TGCCGGACGA TGGGAGCGTG 150 
AGCAGGACGG TGATAACCTC TCCCCGATCG GGTTGCGAGG GCGCCGGGCA 200 
GAGGCCAGGA CGCGAGCCGC CAGCGGCGGG ACCCATCGAC 6ACTTCCCGG 250 
GGCGACAGGA GCAGCCCCGA GAGCCAGGGC GAGCGCCCGT TCCAGGTGGC 300 
CGGACCGCCC GCCGCGTCCG CGCCGCGCTC CCTGCAGGCA ACGGGAGACG 350 
CCCCCGCGCA GCGCGAGCGC CTCAGCGCGG CCGCTCGCTC TCCCCATCGA 400 
GGGACAAACT TTTCCCAAAC CCGATCCGAG CCCTTGGACC AAACTCGCCT 450 



GCGCCGAGAG CCGTCCGCGT AGAGCGCTCC GTCTCCGGCG AG ATG 495 

Met 
1 

TCC GAG CGC AAA GAA GGC AGA GGC AAA GGG AAG GGC AAG 534 
Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys 
5 10 

AAG AAG GAG CGA GGC TCC GGC AAG AAG CCG GAG TCC GCG 573 
Lys Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala 
15 20 25 

GCG GGC AGC CAG AGC CCA GCC TTG CCT CCC CAA TTG AAA 612 
Ala Gly Ser Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys 
30 35 40 

GAG ATG AAA AGC CAG GAA TCG GCT GCA GGT TCC AAA CTA 651 
Glu Met Lys Ser Gin Glu Ser Ala Ala Gly Ser Lys Leu 

45 50 

GTC CTT CGG TGT GAA ACC AGT TCT GAA TAC TCC TCT CTC 690 
Val Leu Arg Cys Glu Thr Ser Ser Glu Tyr Ser Ser Leu 
55 60 65 

AGA TTC AAG TGG TTC AAG AAT GGG AAT GAA TTG AAT CGA 729 
Arg Phe Lys Trp Phe Lys Asn Gly Asn Glu Leu Asn Arg 
70 75 

FIG. I3A SUBSTITUTE SHEET 
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2A / 32 

aaa aac aaa cca caa aat atc aag ata caa aaa aag cca 768 
Lys Asn Lys Pro Gin Asn lie Lys lie Gin Lys Lys Pro 
80 85 90 

GGG AAG TCA GAA CTT CGC ATT AAC AAA GCA TCA CTG OCT 807 
Gly Lys Ser Glu Leu Arg lie Asn Lys Ala Ser Leu Ala 
95 100 105 

GAT TCT GGA GAG TAT ATG TGC AAA GTG ATC AGC AAA TTA 846 
Asp Ser Gly Glu Tyr Met Cys Lys Val lie Ser Lys Leu 

110 115 

GGA AAT GAC AGT GCC TCT GCC AAT ATC ACC ATC GTG GAA 885 
Gly Asn Asp Ser Ala Ser Ala Asn He Thr He Val Glu 
120 125 130 

TCA AAC GAG ATC ATC ACT GGT ATG CCA GCC TCA ACT GAA 924 
Ser Asn Glu He He Thr Gly Met Pro Ala Ser Thr Glu 
135 140 

GGA GCA TAT GTG TCT TCA GAG TCT CCC ATT AGA ATA TCA 963 
Gly Ala Tyr Val Ser Ser Glu Ser Pro He Arg He Ser 
145 150 155 

GTA TCC ACA GAA GGA GCA AAT ACT TCT TCA TCT ACA TCT 1002 
Val Ser Thr Glu Gly Ala Asn Thr Ser Ser Ser Thr Ser 
160 165 170 

ACA TCC ACC ACT GGG ACA AGC CAT CTT GTA AAA TGT GCG 1041 
Thr Ser Thr Thr Gly Thr Ser His Leu Val Lys Cys Ala 

175 180 

GAG AAG GAG AAA ACT TTC TGT GTG AAT GGA GGG GAG TGC 1080 
Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu Cys 
185 190 195 

TTC ATG GTG AAA GAC CTT TCA AAC CCC TCG AGA TAC TTG 1119 
Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu 
200 205 

TGC AAG TGC CCA AAT GAG TTT ACT GGT GAT CGC TGC CAA 1158 
Cys Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin 
210 215 220 

AAC TAC GTA ATG GCC AGC TTC TAC AGT ACG TCC ACT CCC 1197 
Asn Tyr Val Met Ala Ser Phe Tyr Ser Thr Ser Thr Pro 
225 230 235 

TTT CTG TCT CTG CCT GAA TAGGA GCATGCTCAG TTGGTGCTGC 1240 
Phe Leu Ser Leu Pro Glu 

240 241 

TTTCTTGTTG CTGCATCTCC CCTCAGATTC CACCTAGAGC TAGATGTGTC 129: 
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TTACCAGATC TAATATTGAC TGCCTCTGCC TGTCGCATGA GAACATTAAC 1340 
AAAAGCAATT GTATTACTTC CTCTGTTCGC GACTAGTTGG CTCTGAGATA 1390 
CTAATAGGTG TGTGAGGCTC CGGATGTTTC TGGAATTGAT ATTGAATGAT 1440 
GTGATACAAA TTGATAGTCA ATATCAAGCA GTGAAATATG ATAATAAAGG 1490 
CATTTCAAAG TCTCACTTTT ATTGATAAAA TAAAAATCAT TCTACTGAAC 1540 
AGTCCATCTT CTTTATACAA TGACCACATC CTGAAAAGGG TGTTGCTAAG 1590 
CTGTAACCGA TATGCACTTG AAATGATGGT AAGTTAATTT TGATTCAGAA 1640 
TGTGTTATTT GTCACAAATA AACATAATAA AAGGAGTTCA GATGTTTTTC 1690 
TTCATTAACC AAAAAAAAAA AAAAA 1715 

FIG. I3C 
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GAG6CGCCTG CCTCCAACCT GCG^(^G^A GGTGGGTGGC TGCGGGGCAA 50 



TTGAAAAAGA GCCGGCGAGG AGTTCCCCGA AACTT6TTGG AACTCCGGGC 100 
TCGCGCGGAG GCCAGGAGCT GAGCGGCGGC GGCTGCCGGA CGATGGGAGC 150 
GTGAGCAGGA CGGTGATAAC CTCTCCCCGA TCGGGTTGCG AGGGCGCCGG 200 
GCAGAGGCCA GGACGCGAGC CGCCAGCGGC GGGACCCATC GACGACTTCC 250 
CGGGGCGACA GGAGCAGCCC CGAGAGCCAG GGCGAGCGCC CGTTCCAGgL 300 
GGCCGGACCG CCCGCCGCGT CCGCGCCGCG CTCCCTGCAG GCAACGGGAG 350 
ACGCCCCCGC GCAGCGCGAG CGCCTCAGCG CGGCCGCTCG CTCTCCCCAT 400 
CGAGGGACAA ACTTTTCCCA AACCCGATCC GAGCCCTTGG ACCAAACTCG 450 



CCTGCGCCGA GAGCCGTCCG CGTAGAGCGC TCCGTCTCCG GCGAG AT 497 

Met 
1 

G TCC GAG CGC AAA GAA GGC AGA GGC AAA GGG AAG GGC AAG 537 
Ser Glu Arg Lys Glu Gly Arg Gly Lys Gly Lys Gly Lys 
5 10 

AAG AAG GAG CGA GGC TCC GGC AAG AAG CCG GAG TCC GCG 576 
Lys Lys Glu Arg Gly Ser Gly Lys Lys Pro Glu Ser Ala 
15 20 25 

GCG GGC AGC CAG AGC CCA GCC TTG CCT CCC CAA TTG AAA 615 
Ala Gly Ser Gin Ser Pro Ala Leu Pro Pro Gin Leu Lys 
30 35 40 

GAG ATG AAA AGC CAG GAA TCG GCT GCA GGT TCC AAA CTA 654 
Glu Met Lys Ser Gin Glu Ser Ala Ala Gly Ser Lys Leu 

45 50 

GTC CTT CGG TGT GAA ACC AGT TCT GAA TAC TCC TCT CTC 693 
Val Leu Arg Cys Glu Thr Ser Ser Glu Tyr Ser Ser Leu 
55 60 65 

AGA TTC AAG TGG TTC AAG AAT GGG AAT GAA TTG AAT CGA 732 
Arg Phe Lys Trp Phe Lys Asn Gly Asn Glu Leu Asn Arg 
70 75 
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AAA AAC AAA CCA CAA AAT ATC AAG ATA CAA AAA AA6 CCA 771 
Lys Asn Lys Pro Gin Asn lie Lys He Gin Lys Lys Pro 
80 85 90 

GGG AAG TCA GAA CTT CGC ATT AAC AAA OCA TCA CTG GCT 810 
Gly Lys Ser Glu Leu Arg He Asn Lys Ala Ser Leu Ala 
95 100 105 

GAT TCT GGA GAG TAT ATG TGC AAA 6TG ATC AGC AAA TTA 849 
Asp Ser Gly Glu Tyr Met Cys Lys Val He Ser Lys Leu 

110 115 

GGA AAT GAG AGT GCC TCT GCC AAT ATC ACC ATC GTG GAA 888 
Gly Asn Asp Ser Ala Ser Ala Asn He Thr He Val Glu 
120 125 130 

TCA AAC GAG ATC ATC ACT GGT ATG CCA GCC TCA ACT GAA 927 
Ser Asn Glu He He Thr Gly Met Pro Ala Ser Thr Glu 
135 140 

GGA GCA TAT GTG TCT TCA GAG TCT CCC ATT AGA ATA TCA 966 
Gly Ala Tyr Val Ser Ser Glu Ser Pro He Arg He Ser 
145 150 155 

GTA TCC AC A GAA GGA GCA AAT ACT TCT TCA TCT ACA TCT 1005 
Val Ser Thr Glu Gly Ala Asn Thr Ser Ser Ser Thr Ser 
160 165 170 

ACA TCC ACC ACT GGG ACA AGC CAT CTT GTA AAA TGT GCG 1044 
Thr Ser Thr Thr Gly Thr Ser His Leu Val Lys Cys Ala 

175 180 

GAG AAG GAG AAA ACT TTC TGT GTG AAT GGA GGG GAG TGC 1083 
Glu Lys Glu Lys Thr Phe Cys Val Asn Gly Gly Glu Cys 
185 190 195 

TTC ATG GTG AAA GAC CTT TCA AAC CCC TCG AGA TAC TTG 1122 
Phe Met Val Lys Asp Leu Ser Asn Pro Ser Arg Tyr Leu 
200 205 

TGC AAG TGC CCA AAT GAG TTT ACT GGT GAT CGC TGC CAA 1161 
Cys Lys Cys Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin 
210 215 220 

AAC TAC GTA ATG GCC AGC TTC TAC AAG GCG GAG GAG CTG 1200 
Asn Tyr Val Met Ala Ser Phe Tyr Lys Ala Glu Glu Leu 
225 230 235 

TAC CAG AAG AGA GTG CTG ACC ATA ACC GGC ATC TGC ATC 1239 
Tyr Gin Lys Arg Val Leu Thr He Thr Gly He Cys He 

240 245 

GCC CTC CTT GTG GTC GGC ATC ATG TGT GTG GTG GCC TAC 1276 
Ala Leu Leu Val Val Gly He Met Cvs Val Val Ala Tyr 

FIG. I4B ' "° 
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TGC AAA ACC AAG AAA CAG CGG AAA AAG CTG CAT GAC CGT 1317 
Cys Lys Thr Lys Lys Gin Arg Lys Lys Leu His Asp Arg 
265 270 

CTT CGG CAG AGC CTT CGG TCT GAA CGA AAC AAT ATG ATG 1356 
Leu Arg Gin Ser Leu Arg Ser Glu Arg Asn Asn Met Met 
275 280 285 

AAC ATT GCC AAT GGG CCT CAC CAT CCT AAC CCA CCC CCC 1395 
Asn lie Ala Asn Gly Pro His His Pro Asn Pro Pro Pro 
290 295 300 

GAG AAT GTC CAG CTG GTG AAT CAA TAC GTA TCT AAA AAC 1434 
Glu Asn Val Gin Leu Val Asn Gin Tyr Val Ser Lys Asn 

305 310 

GTC ATC TCC AGT GAG CAT ATT GTT GAG AGA GAA GCA GAG 1473 
Val lie Ser Ser Glu His He Val Glu Arg Glu Ala Glu 
315 320 325 

ACA TCC TTT TCC ACC AGT CAC TAT ACT TCC ACA GCC CAT 1512 
Thr Ser Phe Ser Thr Ser His Tyr Thr Ser Thr Ala His 
330 335 

CAC TCC ACT ACT GTC ACC CAG ACT CCT AGC CAC AGC TGG 1551 
His Ser Thr Thr Val Thr Gin Thr Pro Ser His Ser Trp 
340 345 350 

AGC AAC GGA CAC ACT GAA AGC ATC CTT TCC GAA AGC CAC 1590 
Ser Asn Gly His Thr Glu Ser He Leu Ser Glu Ser His 
355 360 365 

TCT GTA ATC GTG ATG TCA TCC GTA GAA AAC AGT AGG CAC 162 9 
Ser Val He Val Met Ser Ser Val Glu Asn Ser Arg His 

370 375 

AGC AGC CCA ACT GGG GGC CCA AGA GGA CGT CTT AAT GGC 1668 
Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn Gly 
380 385 390 

ACA GGA GGC CCT CGT GAA TGT AAC AGC TTC CTC AGG CAT 1707 
Thr Gly Gly Pro Arg Glu Cys Asn Ser Phe Leu Arg His 
395 400 

GCC AGA GAA ACC CCT GAT TCC TAC CGA GAC TCT CCT CAT 1746 
Ala Arg Glu Thr Pro Asp Ser Tyr Arg Asp Ser Pro His 
405 410 415 

AGT GAA AGG TAAAA CCGAAGGCAA AGCTACTGCA GAGGAGAAAC 1790 
Ser Glu Arg 
420 
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TCAGTCAGAG AATCCCTGTG AGCACCTGCG GTCTCACCTC AGGAAATCTA 1840 
CTCTAATCAG AATAAGGGGC GGCA6TTACC TGTTCTAGGA 6TGCTCCTAG 1890 
TTGATGAAGT CATCTCTTTG TTTGACGGAA CTTATTTCTT CTGAGCTTCT 1940 

i 

t 

CTCGTCGTCC CAGTGACTGA CAGGCAACAG ACTCTTAAAG AGCTGGGATG 1990 
CTTTGATGCG GAAGGTGCAG CACATGGAGT TTCCAGCTCT GGCCATGGGC 2040 
TCAGACCCAC TCGGGGTCTC AGTGTCCTCA GTTGTAACAT TAGAGAGATG 2090 
GCATCAATGC TTGATAAGGA CCCTTCTATA ATTCCAATTG CCAGTTATCC 2140 
AAACTCTGAT TCGGTGGTCG AGCTGGCCTC GTGTTCTTAT CTGCTAACCC 2190 
TGTCTTACCT TCCAGCCTCA GTTAAGTCAA ATCAAGGGCT ATGTCATTGC 2240 
TGAATGTCAT GGGGGGCAAC TGCTTGCCCT CCACCCTATA GTATCTATTT 2290 
TATGAAATTC CAAGAAGGGA TGAATAAATA AATCTCTTGG ATGCTGCGTC 2340 
TGGCAGTCTT CACGGGTGGT TTTCAAAGCA GAAAAAAAAA AAAAAAAAAA 2390 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA A 2431 

FIG. I4D 
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